Drive Clearly Hurting in Non-Corr ParityChk; Run Correcting or Replace Now?

wheel · March 1, 2019

Figured I’d check in here before I embark on the current plan I have based on threads I’ve read so far: an old disk in my array is showing the Orange Triangle in its dashboard status with reported uncorrect: 1, and my monthly non-correcting parity check is almost past that drive’s maximum terabyte position with 310 sync errors detected. It’s running hot at 51 C and heating the drives nearby it as well.

From what I can tell, my choices are basically roll the dice either way. If I run the correcting parity check to fix the 310 sync errors, the whole drive could die during that second round (I’m still not 100% sure it’ll survive this one, but the drive should spin down in an hour or so when the check moves on to the higher capacity drives exclusively). If I replace the drive, whatever errors I’m sitting on right now will be rebuilt into the replacement drive, which isn’t great, but feels way better than risking losing the entire drive’s worth of data. Nothing added recently is irreplaceable, but the drive as a whole would be a huge pain to repopulate.

So despite the sync errors on this parity check, based on what I’ve read, I’m making an educated gamble and planning to throw the replacement drive in there and start a build tomorrow morning when this parity check completes.

If I’m absolutely misreading the risks in either approach and should run another correcting parity check no matter what, I’d greatly appreciate anyone warning me off my current path!

JorgeB · March 1, 2019

Please post the diagnostics: Tools -> Diagnostics

wheel · March 1, 2019

Diagnostics attached; was just about to start the replacement / rebuild this morning, but I'll leave this here and wait til I get home again before messing with anything in case something in the log is throwing up flags I'm oblivious to with my limited knowledge.

Thanks!

EDIT: Just noticed my docker.img is sitting on the troubled disk (#8) somehow, instead of in the cache. Errors make more sense now since I hadn't been writing much of anything to that drive lately, but I've got a feeling I need to shift that file to my cache drive somehow. Going to research that process when I get home, too.

tower3-diagnostics-20190301-1725.zip

Edited March 1, 2019 by wheel

JorgeB · March 1, 2019

Diagnostics are after rebooting, was most interested in seeing the syslog during the check, now just go ahead and replace the disk, though it doesn't look that bad, but it is a ST3000DM001 with 50k hours, so it lasted long enough.

wheel · March 1, 2019

Yeah, the age alone has had this disk on my "watch list" for awhile now. If the errors are related to my docker.img being on the problem drive, does that change my stance on the risks of running a correcting parity check vs just replacing the drive now?

JorgeB · March 1, 2019

I would guess a second check would most likely complete without read erroros, but it's just an educated guess, you can also run an extended SMART test before to confirm all is good, the error is typical from Seagate where there was an UNC error but on a bogus LBA address, so the media itself should be fine.

Drive Clearly Hurting in Non-Corr ParityChk; Run Correcting or Replace Now?

Recommended Posts

wheel

Link to comment

JorgeB

Link to comment

wheel

Link to comment

JorgeB

Link to comment

wheel

Link to comment

JorgeB

Link to comment

Join the conversation