Jump to content

replacing HDD


HKR

Recommended Posts

Hello,

 

I am running unRaid6, and it seems one of my 4TB data disk is gone bad, i am going to try and copy all the data from it to a backup disk outside of the array and send the disk for RMA.

 

Please let me know if this procedure i plan on following is correct.

 

1) Copy data over to backup disk

2) Stop the array, remove the bad disk

3) start the array and continue using it till the replacment disk arrives from RMA

4) Copy over the backed up data back to the array after putting in the new disk.

 

Thanks

Link to comment

Hello,

 

I am running unRaid6, and it seems one of my 4TB data disk is gone bad, i am going to try and copy all the data from it to a backup disk outside of the array and send the disk for RMA.

 

Please let me know if this procedure i plan on following is correct.

 

1) Copy data over to backup disk

2) Stop the array, remove the bad disk

3) start the array and continue using it till the replacment disk arrives from RMA

4) Copy over the backed up data back to the array after putting in the new disk.

 

Thanks

Strictly speaking steps 1 and 4 are unnecessary (although they do not do any damage and provide s level of resilience against further failures).    When the replacement disk arrives, you should just be able to assign it as disk4 and let unRAID rebuild disk4 and if nothing goes wrong the contents will still be there.

 

BTW:  What makes you think the disk has gone bad?  Many times a disk is 'disabled' by unRAID because of an external factor (cabling/power).    A SMART report for the disk could allow for some analysis of whether this is the case.  Actually the easiest thing would be to provide the standard diagnostics (Tools->Diagnostics) as that includes SMART reports as part of the contents.

Link to comment

This is actually a new disk, its barely 3months old and at every weekly parity scan the disk kept popping up errors. Today there were several errors and the parity check got aborted and now the disk is in disabled state.

 

i have attached the report requested.

 

WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT - 4 TB (sdh)  is the disk throwing up errors.

The SMART report for that disk is empty which suggests that it has dropped offline.   

 

Looking at the syslog it is not obvious whether the it was disk5 itself or an external factor that caused the initial failure.

Link to comment

I tried restart/power off the system but the drive still remains in disabled state...

Once a disk has been disabled by unRAID it will stay disabled until you rebuild it.  If you think that the disable was due to an external factor and the drive is OK, then you can rebuild back onto the same drive.

 

I have attached a new diagnostics report, it shows the SMART report of the drive in question, please have a look.

The SMART report for that drive looks fine in this latest report. This raises the question of why the disable happened in the first place?  It could be a cabling/power/controller type issue rather than the drive itself.  The fact that it seemed to take a power off/on to bring the drive back online suggests that the controller went down for some reason.
Link to comment

Ok, So i will check all the cabling and connectors, then should i do a parity check to get the drive back online? I tried to copy over about 2.5TB of data to a spare backup drive and it went just fine without any errors, which makes me think the drive is just fine as you said earlier.

 

 

Link to comment

Ok, So i will check all the cabling and connectors, then should i do a parity check to get the drive back online?

 

You can't do a parity check with a disabled drive.  (The option won't even be available)

 

 

... I tried to copy over about 2.5TB of data to a spare backup drive and it went just fine without any errors, which makes me think the drive is just fine as you said earlier.

 

You were copying from the emulated drive (i.e. the drive's contents were being recreated by reading ALL of the other disks in the array plus parity, and emulating the failed drive.  You were NOT reading from the failed drive.

 

Link to comment

Can you please tell me how i can do that?

 

Assuming you've replaced and/or reseated the cables, and the drive is now "seen" okay, you need to do the following:

 

(a)  Stop the array and unassign the drive from its current slot.

(b)  Start the array -- you will now see a "missing" drive.

©  Stop the array and assign the drive to the empty (missing) drive slot.

(d)  Start the array and let it do the rebuild.

 

Note this is the same procedure you'd use if you were replacing the drive with another drive -- just in case that becomes necessary.

 

Link to comment

A comment r.e. your original plan ...

 

1) Copy data over to backup disk

2) Stop the array, remove the bad disk

3) start the array and continue using it till the replacment disk arrives from RMA

4) Copy over the backed up data back to the array after putting in the new disk.

 

You do NOT want to do #3.    If you're going to wait for a replacement drive, I would leave the array off until it arrives UNLESS you have a complete set of backups.  Without that drive your array is completely unprotected -- should another drive fail, you'll lose all of the data on that drive.

 

Link to comment

Guys, i still think there is something wrong with this disk. When the scheduled parity check started it used to usually take about 12-15hrs to complete, but right now its been on for over 2days and it says 75days to go... the moment it starts reading/writing Disk 5 data the parity check comes to a stand still.

 

Are there any test we could run to confirm the disk is actually bad?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...