Jump to content

Bad Drive! Or something else?


abernardi

Recommended Posts

:'( I'm not sure what happened.  I was off the computer all day yesterday and most of the day today.  I checked with unMENU tonight and got DISK_DSBL on disk 3!  It's got a red dot on the regular unRAID menu.  Looking at the syslog, it's huge, if I'm reading it correctly, the server restarted on its own at 4:40AM.  I didn't do that, so something happened.  Could this be a bad cable, card?  I'm including the whole syslog, but had to break it into several parts.

syslog_2010_8_4_pt1.txt.zip

syslog_2010_8_4_pt2.txt.zip

Link to comment

alright, to answer myself, I just read the troubleshooting section...should have done that first, eh?...anyway, I'll do a smart test and see what comes up.  One question, I'm getting the red ball next to disk 3, but it is still available in the array.  I thought unRAID would take it offline.

Link to comment

alright, to answer myself, I just read the troubleshooting section...should have done that first, eh?...anyway, I'll do a smart test and see what comes up.  One question, I'm getting the red ball next to disk 3, but it is still available in the array.  I thought unRAID would take it offline.

A red ball indicates that a "write" to the drive failed so it was disabled.

 

The remaining drives in combination with parity are simulating the failed drive.   If you read files from the failed drive you are actually reading the corresponding blocks of data from ALL the drives and calculating what was on the failed drive.  It is exactly why you have an unRAID array.

 

If you had not looked at the management console you might never have noticed the failed indicator.  

 

You can even write to the simulated drive.  It too is written to the parity drive as if the drive were actually there.   Do not be fooled though, if you have a second disk fail you'll lose the contents of BOTH the failed drives.

 

You will want to get the drive back on-line as soon as possible.  If you thing it was a cabling issue, then to fix the cable:

 

Stop the array

Un-assign the failed drive

Power down

Fix/re-seat the cable

Power Up

Start the array with the disk un-assigned.  You'll still be able to get to the contents of it, as an un-assigned drive is treated exactly as a failed drive.  You;ll still access the drive "simulated" by parity and all the other drives. Starting the array with the drive un-assigned caused the array to forget the serial number of the failed disk.

Stop the array once more

Re-assign the failed disk.  (unRAID will think it is a new disk, since it forgot the serial number in the prior step)

Start the array once more.   It will then begin the process of re-constructing the old simulated contents onto the new physical drive.

 

If the drive was really physically bad, same exact steps, just install the new disk when you have the array powered off.

 

DO NOT press the button labeled as "Restore" as it has nothing to do with re-constructing data.  It Immediately invalidates parity and would prevent re-construction of a replacement disk.

 

Joe L.

Link to comment

Right, of course.  Thanks Brit, Joe, one more question, I'm guessing I can still try to run a smart test via telnet if the physical disk isn't completely dead, right?  How would you determine if the disk has failed or it's a bad cable or something else?

 

EDIT:  ah, nevermind, I'll follow the steps in the troubleshooting guide.  THANKS!!!

Link to comment

UPDATE:  OK, I ran the smartctl which I'm including.  I think it looks good except for:

 

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      309

 

which according to the wiki should be 0 and is probably due to a bad cable.  I followed your directions Joe, and swapped out the cable and now it's rebuilding the disk.  It's estimating 638 min. which is probably appropriate, it's a 2TB drive.

 

It's really amazing that the "simulated" disk 3 is still there as if nothing happened.  Great work Tom!

Smart_2010-8-2010.txt

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...