abernardi Posted August 5, 2010 Share Posted August 5, 2010 :'( I'm not sure what happened. I was off the computer all day yesterday and most of the day today. I checked with unMENU tonight and got DISK_DSBL on disk 3! It's got a red dot on the regular unRAID menu. Looking at the syslog, it's huge, if I'm reading it correctly, the server restarted on its own at 4:40AM. I didn't do that, so something happened. Could this be a bad cable, card? I'm including the whole syslog, but had to break it into several parts. syslog_2010_8_4_pt1.txt.zip syslog_2010_8_4_pt2.txt.zip Link to comment
abernardi Posted August 5, 2010 Author Share Posted August 5, 2010 Two more... syslog_2010_8_4_pt3.txt.zip syslog_2010_8_4_pt4.txt.zip Link to comment
abernardi Posted August 5, 2010 Author Share Posted August 5, 2010 Last one... syslog_2010_8_4_pt5.txt.zip Link to comment
abernardi Posted August 5, 2010 Author Share Posted August 5, 2010 alright, to answer myself, I just read the troubleshooting section...should have done that first, eh?...anyway, I'll do a smart test and see what comes up. One question, I'm getting the red ball next to disk 3, but it is still available in the array. I thought unRAID would take it offline. Link to comment
BRiT Posted August 5, 2010 Share Posted August 5, 2010 The red ball means the physical disk is not available. unRaid is working as expected. It is simulating the failed drive from the other drives in the array and parity. What did you expect would happen? Link to comment
Joe L. Posted August 5, 2010 Share Posted August 5, 2010 alright, to answer myself, I just read the troubleshooting section...should have done that first, eh?...anyway, I'll do a smart test and see what comes up. One question, I'm getting the red ball next to disk 3, but it is still available in the array. I thought unRAID would take it offline. A red ball indicates that a "write" to the drive failed so it was disabled. The remaining drives in combination with parity are simulating the failed drive. If you read files from the failed drive you are actually reading the corresponding blocks of data from ALL the drives and calculating what was on the failed drive. It is exactly why you have an unRAID array. If you had not looked at the management console you might never have noticed the failed indicator. You can even write to the simulated drive. It too is written to the parity drive as if the drive were actually there. Do not be fooled though, if you have a second disk fail you'll lose the contents of BOTH the failed drives. You will want to get the drive back on-line as soon as possible. If you thing it was a cabling issue, then to fix the cable: Stop the array Un-assign the failed drive Power down Fix/re-seat the cable Power Up Start the array with the disk un-assigned. You'll still be able to get to the contents of it, as an un-assigned drive is treated exactly as a failed drive. You;ll still access the drive "simulated" by parity and all the other drives. Starting the array with the drive un-assigned caused the array to forget the serial number of the failed disk. Stop the array once more Re-assign the failed disk. (unRAID will think it is a new disk, since it forgot the serial number in the prior step) Start the array once more. It will then begin the process of re-constructing the old simulated contents onto the new physical drive. If the drive was really physically bad, same exact steps, just install the new disk when you have the array powered off. DO NOT press the button labeled as "Restore" as it has nothing to do with re-constructing data. It Immediately invalidates parity and would prevent re-construction of a replacement disk. Joe L. Link to comment
abernardi Posted August 6, 2010 Author Share Posted August 6, 2010 Right, of course. Thanks Brit, Joe, one more question, I'm guessing I can still try to run a smart test via telnet if the physical disk isn't completely dead, right? How would you determine if the disk has failed or it's a bad cable or something else? EDIT: ah, nevermind, I'll follow the steps in the troubleshooting guide. THANKS!!! Link to comment
abernardi Posted August 6, 2010 Author Share Posted August 6, 2010 UPDATE: OK, I ran the smartctl which I'm including. I think it looks good except for: 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 309 which according to the wiki should be 0 and is probably due to a bad cable. I followed your directions Joe, and swapped out the cable and now it's rebuilding the disk. It's estimating 638 min. which is probably appropriate, it's a 2TB drive. It's really amazing that the "simulated" disk 3 is still there as if nothing happened. Great work Tom! Smart_2010-8-2010.txt Link to comment
abernardi Posted August 7, 2010 Author Share Posted August 7, 2010 Well it worked! Everything's back to normal. This is an amazing system! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.