June 11, 201313 yr My server ran it's monthly parity check and found errors. I set it to run again and correct errors. When I did that, it made the web gui unresponsive. I was able to run tail -f /var/log/syslog and get enough info to know that I either had a loose cable or bad sata port. I powered down the server. Turns out I did have a lose cable. I got it back up and ran another parity check set to correct errors and it shows that it found 38k errors. The web gui shows out beside my parity drive that it has 1513 errors. I'm going to run another parity check with corrections but I'm not sure that is going to fix it. I'm attaching a syslog and smart report. I would really appreciate if someone with more knowledge could guide me in the right direction. syslog.zip smart_report.txt
June 11, 201313 yr My nickel's worth ... => The parity drive passes SMART, but has some "interesting" stats ... ... no reallocated sectors or pending relocations (good) ... over 10,000 unexpected head retractions !! (bad) => All of your errors (if I understand what you wrote correctly) are read errors on the parity disk => You found a lose cable on the parity drive, and have now corrected it If all of the above is true, then you can reasonably expect that your parity was no good. Running a correcting parity check (as you did) should correct it ... and when you run another one it should show zero errors. If that's NOT the case, then I'd replace the parity drive.
June 11, 201313 yr Author Thank you for your quick reply and your nickel's worth Hopefully all is well with and my parity check fixes everything. I checked on the drive and I still have warranty until 12/2014. When the parity check is done I'll report back here. Also, where do you see about the head retractions? Is that the Power-Off_Retract_Count?
June 11, 201313 yr Yes, the Power_Off_Retract_Count is how many times the heads were retracted due to a sudden power loss. Your drive shows a VERY large count for that ... probably because you had a loose power connection -- and clearly this could have caused a variety of other problems.
June 11, 201313 yr Author The parity check with corrections has finished. "I" think it looks fine but I'm still new to all of this. Log and screen shot posted. Now it shows there were no errors found, but out beside the parity drive in the screen shot it shows errors. Is that just showing past errors? syslog.zip
June 11, 201313 yr Result of the check are fine, but I'm not sure about the errors -- they're probably residual from the previous time. Check the "Clear Stats" button; then rerun a correcting parity check. That should have all zeroes for the errors ... if so, you're definitely good.
June 11, 201313 yr This line from the SMART report is rather unusual: 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 862 A search through this forum shows only a few drives with this sort of error being reported, and in all the cases I looked at, the recommendation was to replace the drive. What's worrisome is that the "value" and "worst" are down to only 001 and the threshold is at 000, for a good drive the value should be 100. Additionally the 862 was the largest reported value of all the reports, there was one at 18 and another at 6 I think - though the actual number may not mean much to anyone but the manufacturer. Regards, Stephen
June 12, 201313 yr Author Gary, the result of the latest parity check came back with 0 errors. Thanks for all of your prompt replies. Stephen, I'm gonna ride with what I have atm but I'm going to keep getting me a new one on my mind. Thanks for your input also.
June 12, 201313 yr I think the drive is fine ... the loose cable most likely caused the errors you were getting. But it's always a good idea to have a spare drive that you've already thoroughly tested, so it's ready to use as a replacement immediately when you need one.
June 12, 201313 yr Author I hope it just makes it until November for black Friday. If they go on sale again I'll have me a spare for sure.
Archived
This topic is now archived and is closed to further replies.