Guide to replace failed drive


Janus

Recommended Posts

Hi,

As I tested something (iSCSI from VM) and one of my drive has been whipped (partition was changed) and the disk is now unusable from my Array. I read some post about disk replacement, but couple of things seems ... confusing. Do I realy need to reboot server ?

What are the correct way to "replace" disk. (Reinsert the same one, but clean)

Curently, I removed the disk from the array and with the tool "Preclear Disk" I'm clearing it.

 

But After .... ??

What is the correct way to reinsert it to my array ? Just beside my parity disk, i can see "All existing data on this device will be OVERWRITTEN when array is Started". A little scary ...

 

Personally, when the clear disk will be completed, I plan to:

- Reassign disk to Array

- Check Parity is valid and click start

image.thumb.png.c2075009cbefb67ca0f320d2a90416e4.png

 

Is it the right process ? ? ?   (I don't want to loose data)

The notice beside my parity disk scares me a bit (All existing data on this device will be OVERWRITTEN when array is Started)

 

Many thanks,

Link to comment

Here is the Diagnostic

Also, if it can help, here the latest lsblk I ran before starting th clear the disk config.

You can see that /sdg has been completely reconfigured by my operation and seems non usable anymore by the RAID volume.

And the /sdd is my parity disk

 

Thanks,

root@teletraan:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0    7:0    0  22.7M  1 loop /lib/modules
loop1    7:1    0   7.1M  1 loop /lib/firmware
loop2    7:2    0    20G  0 loop /var/lib/docker
loop3    7:3    0     1G  0 loop /etc/libvirt
sda      8:0    1  28.7G  0 disk
└─sda1   8:1    1  28.7G  0 part /boot
sdb      8:16   0   2.7T  0 disk
└─sdb1   8:17   0   2.7T  0 part
sdc      8:32   0   2.7T  0 disk
└─sdc1   8:33   0   2.7T  0 part
sdd      8:48   0   5.5T  0 disk
└─sdd1   8:49   0   5.5T  0 part
sde      8:64   0   2.7T  0 disk
└─sde1   8:65   0   2.7T  0 part
sdf      8:80   0 223.6G  0 disk
├─sdf1   8:81   0     1M  0 part
└─sdf2   8:82   0 223.6G  0 part
sdg      8:96   0   2.7T  0 disk
├─sdg1   8:97   0     8G  0 part
└─sdg2   8:98   0     1G  0 part
md1      9:1    0   2.7T  0 md   /mnt/disk1
md2      9:2    0   2.7T  0 md   /mnt/disk2
md3      9:3    0   2.7T  0 md   /mnt/disk3
md4      9:4    0   2.7T  0 md

 

teletraan-diagnostics-20201007-2025.zip

Link to comment

Your disks were assigned at the beginning of syslog, but near the end it seems to have lost them and now considers them all NEW.

Oct  7 16:16:27 teletraan emhttpd: shcmd (952): modprobe md-mod super=/boot/config/super.dat
Oct  7 16:16:27 teletraan kernel: md: unRAID driver 2.9.13 installed
Oct  7 16:16:27 teletraan kernel: read_file: error 2 opening /boot/config/super.dat
Oct  7 16:16:27 teletraan kernel: md: could not read superblock from /boot/config/super.dat
Oct  7 16:16:27 teletraan kernel: md: initializing superblock
Oct  7 16:16:27 teletraan kernel: mdcmd (1): label 0781-5583-8355-81071F2978D2

Not sure if that is a problem with flash or something you did. Did you do New Config???

 

I was afraid of that after seeing your screenshot. Really wish you had asked for help before doing anything at all, since you were obviously proceeding along the wrong path.

 

Preclearing was and is pointless, but of course, it makes it impossible to get any of your data back from the disk itself.

 

Normally, you would rebuild the disk from parity and all the other disks, but after New (or lost) Config it is wanting to rebuild parity instead.

 

Stop preclear (pointless as already noted). It might be possible to force it to rebuild the data disk instead of parity.

 

Don't do anything else without further advice.

 

Can you tell us anything more about what you have already done?

 

Link to comment

2 days ago, I tried to configure VM to provide iSCSI to my VM Ware. When I realized that this process will reserve one disk for the iSCSI (and remove it from the array), I stopped the process.

At this point, I didn't realized that one of the disk (sdg), was already modified and partitionned like this:

sdg 8:96 0 2.7T 0 disk

├─sdg1 8:97 0 8G 0 part

└─sdg2 8:98 0 1G 0 part

 

Today, I lost electricity and the server shuted down. (UPS are not forever)

 

I believed that the shutdown caused a problem to my ARRAY. I tried to remove sdg from the array, to re-insert it (an start a parity rebuild). No luck.

After reading a bit about the process to rebuild ARRAY. I saw youtube tutorial (from spaceinvader) telling we must clear a disk before interting it to the array.

I started the clear disk process (as the disk was already partitionned, the data was already virtually erased)

So, now, the clear process is completed at 84%.

 

3 of the data disk are healthy and the parity disk also. The 4th is waiting the clean to be completed to be reinsterted and rebuild with parity disk. I cannot believe by having 4 of the 5 disks healthy, I cannot recover the array.

Link to comment
5 hours ago, trurl said:

How did you get a disk with 2 partitions into the array?

 

16 hours ago, Janus said:

2 days ago, I tried to configure VM to provide iSCSI to my VM Ware. When I realized that this process will reserve one disk for the iSCSI (and remove it from the array), I stopped the process.

At this point, I didn't realized that one of the disk (sdg), was already modified and partitionned like this:

sdg 8:96 0 2.7T 0 disk

├─sdg1 8:97 0 8G 0 part

└─sdg2 8:98 0 1G 0 part

 

Link to comment

Ok,

At this point, I must assume a made a lot of mistake. But, as I can see, I did not lost all my data and I hope being able to recover a good part of them. The question is, what will be the easiest way.

Here was the initial setup

PDisk     6Tb        Parity disk           Parity was valid

Disk1     3Tb        Data disk             2.1Tb of data

Disk2     3Tb        Data disk             2.1 Tb of data

Disk3     3Tb        Data disk             1.5 Tb of data

Disk4     3Tb        Data disk             (Unfortunately, I buggued this one and finally cleared it)

At this point, I have 5.7 Tb of my data, splitted on 3 drives, and, probably a portion on my parity disk. (As I realy do not understand the protection/parity process of unraid; sometimes seems esoteric.

As I can see, 3 of my data disks contain a portion of my data. Parity disk is probably valid. When I try to mount it to verify data on this one, I have the message:

root@teletraan:/mnt/test# mount /dev/sdd1 /mnt/test/parity/
mount: /mnt/test/parity: mount(2) system call failed: Structure needs cleaning.

Weird, how the parity disk is configured? how it works ?

 

Is there a process to rebuild array from this? Like, ‘Hey, here is 3 disk, with the portion of data, probably another portion on the parity one, rebuild the array with the 5 disk.’ ?

 

A manual job may be to rsync data, from the available disk, to a new single disk, rebuild a fresh array and recopy data to the new array. But what is the advantage to use a system to protect data if we must do all the job manually ?

 

So, 2 questions:

-          How can I read data from parity disk ?

-          Does UnRAID have a process to give him disk (the 5), tell him to scan them, and rebuild an array with this ?

 

At this point, please, do not only tell me "you should have done this". (I’m open to know the correct way to recover array when you loose a single disk, but please, do not tells me only this.)

 

From now, If I do not have an answer, a recipe until the end of the day, I will try to recover what I can manually and I will reconsider if UnRAID, despite its advantages, is the correct way to protect my datas.

 

Thanks for your help.

Link to comment
2 minutes ago, Janus said:

Parity disk is probably valid. When I try to mount it to verify data on this one, I have the message:

Parity doesn't mount since there's no filesystem, also don't try to mount any of the data disks, except in read only mode, I'll post the instructions for the invalid slot command in a few minutes.

Link to comment

This will only work if parity is still valid, follow the instructions below carefully and ask if there's any doubt.

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments and assign any missing disk(s) if needed, including old disk4
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 4 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish (or cancel it) and then run a filesystem check.

Link to comment
3 hours ago, Janus said:

probably a portion on my parity disk. (As I realy do not understand the protection/parity process of unraid; sometimes seems esoteric.

As noted, parity has no filesystem, and so parity has none of your data.

 

Parity is a very common concept in computers and communications, and it is basically the same wherever it is used. Parity is just an extra bit that allows a missing bit to be calculated from all the other bits.

 

Unraid reads parity plus all the other disks to calculate the data for the missing disk. The parity calculation is extremely simple. Here is the wiki on parity:

https://wiki.unraid.net/UnRAID_6/Overview#Parity-Protected_Array

Link to comment
9 hours ago, JorgeB said:

disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish (or cancel it)

Parity check is now completed. Seems to be successfull. But, hereis what I can see

image.thumb.png.539c5ce62fce53876fd45f532f8822f7.png

Disk 4 did not mount automatically. Normal ? Do I need to re-insert it to the array ? (after running    mdcmd set invalidslot 4 29  ? ? ?) 

How can I validate if the process completed successfully ?

Or, what are the next steps ?

 

Thanks again

Link to comment
Oct  9 07:01:42 teletraan kernel: md: recovery thread: recon D4 ...
...
Oct  9 15:37:14 teletraan kernel: md: sync done. time=30932sec
Oct  9 15:37:14 teletraan kernel: md: recovery thread: exit status: 0

OK

45 minutes ago, Janus said:

what are the next steps ?

10 hours ago, JorgeB said:

if it's unmountable don't format, wait for the rebuild to finish (or cancel it) and then run a filesystem check.

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

 

Be sure to capture the output so you can post it.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.