magic144 Posted May 2, 2013 Share Posted May 2, 2013 ...was just running a monthly parity (no-correct) check on my unRAID 4.7 system (MD1510/LI) and about to shutdown again after it had finished, when I noticed all these errors in the syslog (see attached)... The parity check seemed to have later finished and indicated no errors, but I don't know how credible that now is The drive in question appears to be the parity drive (disk0, 6XW024J9, scsi 11:0:0:0) and has been in commission in my array since about Nov, 2009 without any issues to date. FYI, I don't leave my array on 24/7, just power it up and down periodically when I have some specific archiving activities to perform. I'm also including the SMART info for the parity drive and would be very appreciative if somebody can tell me the way ahead from here... should I run more tests (which?) on the parity drive, or just replace it?? From what I can see, there is a Reported_Uncorrect count of 6 - this also shows in the unMENU Smart History page for this drive as ATA_Error_Count (6), where it has ALWAYS previously been 0. It's long since past any kind of warranty, so there's no point in trying to pinpoint any specific issues for RMA or the like. Very grateful in advance of any assistance! Thanks, m. syslog-2013-05-01.txt SMART_ST32000542AS_6XW024J9.txt Link to comment
magic144 Posted May 2, 2013 Author Share Posted May 2, 2013 I'm trying to run a SMART long-test on the drive in the meantime in case the results of that prove illuminating... Unfortunately I think a spin-down timed it out, so I've set the timeout to Never and restarted the test. Again, the waiting. In the meantime, can anyone confirm the procedure for invalidating and forcing a re-write of parity - in case I can successfully recommission this same drive if it proves to be OK (maybe take it offline, run some pre-clear iterations and reinstate it)?? Would that be a possibly useful way to go? Link to comment
magic144 Posted May 3, 2013 Author Share Posted May 3, 2013 OK - so the Long test revealed NOTHING... "Completed without error". (see Smart Self-test #1, LifeTime(hours)==3945) NB. a Short test had also completed without error. So now what should I do with this drive/array to re-establish confidence?? SMART_ST32000542AS_6XW024J9.txt Link to comment
magic144 Posted May 3, 2013 Author Share Posted May 3, 2013 Well, in the absence of any advice, I'm rebuilding the parity drive to see if that does any bad-block reallocations, or "just works". I stopped the array, unassigned the parity drive. Restarted (without parity) to force it to forget the parity config. Stopped again. Reassigned the parity drive and re-started the array - it is currently performing a full parity sync. Once it's done, I'll re-run the no-correct parity check and see if it hiccups again or not. Link to comment
Joe L. Posted May 3, 2013 Share Posted May 3, 2013 Well. in the absence of any advice, I'm rebuilding the parity drive to see if that does any bad-block reallocations, or "just works". I stopped the array, unassigned the parity drive. Restarted (without parity) to force it to forget the parity config. Stopped again. Reassigned the parity drive and re-started the array - it is currently performing a full parity sync. Once it's done, I'll re-run the no-correct parity check and see if it hiccups again or not. That is exactly what needs to be done to re-construct the drive. Basically, except for the parity errors, or un-readable sectors, it is re-writing what is already there. The un-readable sectors will be either re-written in place or re-allocated. Link to comment
magic144 Posted May 3, 2013 Author Share Posted May 3, 2013 Thank a lot Joe. I'm a little confused as to why there were no short or long test errors. What should I do if this rebuild/re-check doesn't throw up anything - i.e. NO reallocations or errors. Was it all just a hiccup or glitch in that case?? Any theories from the logs as to what the nature of these errors might have been? I'll post back the results when the rebuild and recheck(no correct) are done. Link to comment
magic144 Posted May 4, 2013 Author Share Posted May 4, 2013 So rebuild (sync) and check (nocorrect) of parity drive both went through without error. There appear to have been NO reallocations either. How much should I now be able to trust this disk and why did it give me those errors in the first place??!! Thanks for any theories! syslog-2013-05-04.txt SMART_ST32000542AS_6XW024J9.txt Link to comment
magic144 Posted May 4, 2013 Author Share Posted May 4, 2013 Would also be grateful if anybody can explain the nature of the original errors (first post syslog)... Were these transient failures that eventually succeeded (after a number of retries)? If they were permanently unrecoverable failures, how is it that the parity check ultimately succeeded (and why were no reallocations apparently performed when rewriting the disk)? EDIT - just for completeness, ran another short and long SMART test on the newly rebuilt drive (see attached). SMART_ST32000542AS_6XW024J9.txt Link to comment
magic144 Posted May 7, 2013 Author Share Posted May 7, 2013 Just a thought but can it be that the initial read errors led to zeros being returned to the OS and that just happened to let the parity check succeed by chance? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.