Jump to content

2 Failing drives: How should I approach rebuilding?


Alex R. Berg

Recommended Posts

Hi all,

 

Today I noticed errors on two of my disks in the unRaid webgui (see attached image). I then rebooted, and afterwards it says disk3 is unmountable (if I start array) and disk4 has a grey triangle indicating 'Invalid data content' (or so I think). Unmenu status is disk3='OK' and disk4=DISK_DSBL.

 

Before Image: http://ibin.co/25ErVmSflw07

After image: http://ibin.co/25EriGlJrdBS

 

I don't have the syslog from before reboot, unfortunately. I have attached syslog from after reboot, and also smart data.

 

The Smart queries for failed drives said (among other more normal smart details)

==> WARNING:	Using smartmontools or hdparm with this
drive may result in data loss due to a firmware bug.:	
****** THIS DRIVE MAY OR MAY NOT BE AFFECTED! ******:	
Buggy and fixed firmware report same version number!:	
See the following web pages for details:	
http:	//knowledge.seagate.com/articles/en_US/FAQ/223571en
http:	//sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks
SMART support is:	Available - device has SMART capability.

So I'm wondering if the data loss is due to firmware problems and whether I should update the firmware as mentioned. Both the problematic drives are Samsung F4EG HD204UI for which the firmware addresses.

 

I'm not quite sure how to read the status of my drives, but I'm guessing from the solid unmenu-hint that disk4 is disabled which according to http://lime-technology.com/wiki/index.php/Plugin/webGui/Array_Status#Disabled_disk means what I see on /mnt/disk4 when I start the array is what is reconstructed using parity not actual data on drive 4. When I start the array I see disk4 content but I cannot list content in /mnt/disk3 (ls). So I surmise that disk3 and disk4 have both had block-read problems causing errors to be listed in pre-boot image, but only disk4 has had block-write errors, causing it to be disabled. Do you agree with my assesment so far? (PS: Is there a command in unraid to verify that I'm reading the status correctly, or a webgui page with more information than the chosen icon (green ball vs grey triangle))

 

If my conclusions are correct I would suggest that the safest route is to

1) reconstruct the disabled disk4 on a new drive, and

2) and then update drive firmwares one at a time on each affected disk,

3) and then try running reiserfsck to fix filesystem.

 

Before I proceed I would like a sanity check if someone would be so kind.

 

I have fresh md5's for all files (with my newly created but not yet published tool :) ) and also (hopefully a mostly) full crashplan backup, and also a 3-4 months old offsite backup, so I think I'm covered. For me its mostly a question of getting back up and running with the least fuss and time spent. Learning a bit on the way wouldn't be all bad either.

 

Best Alex

SmartData.txt

syslog.txt

Link to comment

Since what I have seen, if you on unRAID disk settings force NCQ disabled you should not have any problem with that firmware bug...

 

I would not go with a firmware update on a disk containing data...  ???

 

Errors reported on those disks hardly are connected with that bug anyway...

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...