Dan201 Posted August 4, 2012 Share Posted August 4, 2012 Last night before I went to bed I started a parity check on my server (RC5). This morning I have found what I believe is a drive failure. Ive read a bit about the red dot meaning the drive failed in a write request but I just wanted to make sure by asking on here. It currenty states there is a parity rebuild in progress. Once that has finished can I swap the drive out without losing data? How do I find out where unraid has placed the data that was on the failed disk? Ill try to get a SMART report on the disk before I take it out, as suggested on other threads. Kind regards Quote Link to comment
Joe L. Posted August 4, 2012 Share Posted August 4, 2012 No, according to the screen shot, it says a DATA REBUILD is in progress. It is re-constructing the failed disk. It is not a parity check. Let it finish. Post a syslog. (zip it for attachment) Quote Link to comment
beckp Posted August 4, 2012 Share Posted August 4, 2012 Check out the speed and estimated finish and the number of writes. I don't think Dan201 will have the patience to wait almost 82 days. Quote Link to comment
Joe L. Posted August 4, 2012 Share Posted August 4, 2012 Check out the speed and estimated finish and the number of writes. I don't think Dan201 will have the patience to wait almost 82 days. I missed that... I did see it was almost half way done. (the number of "writes" did not bother me) Clearly something is unusual. he needs to examine the syslog to see what is happening. Joe L. Quote Link to comment
RokleM Posted August 4, 2012 Share Posted August 4, 2012 The number of writes is not abnormal from what I've seen. When I purposely pulled a drive, my writes went from ~100k to some ridiculously high false number. A bug I assume. Quote Link to comment
Dan201 Posted August 4, 2012 Author Share Posted August 4, 2012 Its still at 49% at a speed of 28.26Kb/sec. I definitely only started a parity check last night, when I woke up it was rebuilding the parity instead. Should I be worried? text.txt Quote Link to comment
dgaschk Posted August 5, 2012 Share Posted August 5, 2012 Disk 7 is suffering a lot of write errors. Post a SMART report for disk7. Quote Link to comment
Dan201 Posted August 5, 2012 Author Share Posted August 5, 2012 Can I do that while its rebuilding the parity? Still stuck at 49%. Quote Link to comment
Helmonder Posted August 5, 2012 Share Posted August 5, 2012 Yes, you can create a SMART report while rebuilding parity Quote Link to comment
Dan201 Posted August 5, 2012 Author Share Posted August 5, 2012 When I type in 'smartctl -a -d ata /dev/sda' I get the following message; Failed: No such device. C is definitely the failed drive. Quote Link to comment
marcusone Posted August 5, 2012 Share Posted August 5, 2012 When I type in 'smartctl -a -d ata /dev/sda' I get the following message; Failed: No such device. C is definitely the failed drive. Remove the "-d ata" option (doesn't work with some controllers), 'smartctl -a /dev/sda' should be enough. and really we want a report from sdc (the disk having issues by the looks of things), but wouldn't hurt to check all the drives. Quote Link to comment
Dan201 Posted August 5, 2012 Author Share Posted August 5, 2012 I have one for my parity drive but drive C says there is no such device. smart.txt Quote Link to comment
Dan201 Posted August 5, 2012 Author Share Posted August 5, 2012 im at 49% of rebuild still and going at 14/KB/sec. Should I stop the array and reboot? Quote Link to comment
dgaschk Posted August 5, 2012 Share Posted August 5, 2012 Yes. Stop the rebuild. It is not progressing correctly. Shutdown and check the cabling to disk7 and then get a SMART report for disk7. Quote Link to comment
Dan201 Posted August 5, 2012 Author Share Posted August 5, 2012 I tried to stop the array but the server crashed and I had to manually power it down with the button. The good news is I got a smart file for the drive in question. Hurray! I also double checked all the cables and everything seemed fine but I unplugged and reconnected everything just in case. Does the smart report say anything interesting? The server has automatically gone into a parity check when powered on. smart.txt Quote Link to comment
Dan201 Posted August 5, 2012 Author Share Posted August 5, 2012 Just had a quick look myself and it doesn't appear to have worked correctly. Quote Link to comment
mr-hexen Posted August 5, 2012 Share Posted August 5, 2012 try again. (open the file and read it) Quote Link to comment
Dan201 Posted August 5, 2012 Author Share Posted August 5, 2012 I don't understand what its saying. I only provided one device name, sdc. Quote Link to comment
Joe L. Posted August 5, 2012 Share Posted August 5, 2012 I tried to stop the array but the server crashed and I had to manually power it down with the button. The good news is I got a smart file for the drive in question. Hurray! I also double checked all the cables and everything seemed fine but I unplugged and reconnected everything just in case. Does the smart report say anything interesting? The server has automatically gone into a parity check when powered on. you mis-typed the command. (did you even look at the smart report?) smartctl -a not smartctl a- Joe L. Quote Link to comment
Dan201 Posted August 6, 2012 Author Share Posted August 6, 2012 Yes I did read it, as I said it an above post. But I didn't realise I typed the incorrect command in. It just said I had used 2 device names. Here is what I believe to be a successful report. smart.txt Quote Link to comment
Joe L. Posted August 6, 2012 Share Posted August 6, 2012 Yes I did read it, as I said it an above post. But I didn't realise I typed the incorrect command in. It just said I had used 2 device names. Here is what I believe to be a successful report. The only oddity of that SMART report is the very high temperature of the drive. 194 Temperature_Celsius 0x0022 102 101 000 Old_age Always - 50 Most of us try to keep drives below 40C. (low to mid 30s as a goal) Some drives will fail if they get above 50C. You need to work on the array cooling. Quote Link to comment
Dan201 Posted August 6, 2012 Author Share Posted August 6, 2012 Ok ill try to bring the temps down with some new fans. Could the drive have somehow dropped out of the array if over heated? It seems to be working now. The drive that had the red dot doesn't have any data on it so I wold have thought it would be one of the cooler ones as its not often spun up. Quote Link to comment
Joe L. Posted August 6, 2012 Share Posted August 6, 2012 Looking closer, there is this line: 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 20 That attribute usually indicates the drive had to retract the disk heads in an unexpected power down. In your disk that happened 20 times so far. (but the drive has only been power cycled 51 times.) Unless you know something about the prior use of this drive where power was just shut off rather than the OS cleanly shut down, you might inspect the power connections to the drive. (most disks fail to operate properly after power is removed ) Joe L. Quote Link to comment
Dan201 Posted August 6, 2012 Author Share Posted August 6, 2012 My server has crashed on a number of occasions and I had power down by force. It doesn't seem like 20 times but I did have trouble WHS before I used unraid and this drive may have been in that array before. Quote Link to comment
JonathanM Posted August 6, 2012 Share Posted August 6, 2012 Ok ill try to bring the temps down with some new fans. Most cases seem engineered to circulate air through the whole case for the CPU and video card, and leave the drives in somewhat stagnant air. Many times you would do well to make sure all the intake air for the case is forced over the drives, to cool them first. That normally means taping up case holes that don't have fans, and possibly reversing fans to exhaust instead of intake air. Sometimes you need to create ductwork with pieces of cardboard to seal leaks that allow air to go around drives without cooling them. If you don't have an engineering mind, feel free to take pics of an overview of the inside of your case and post them here. Someone will probably make suggestions to help you with the airflow. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.