Huge counts of errors during rebuild (in progress)


ldrax

Recommended Posts

Hi guys,

Looks like I hadn't much luck with my unRAID machines lately. Been having a string of different issues on my 2 builds.
 

Long story short, I'm replacing disk3, out of 6 disks array (1 parity + 5 data drives). Within half an hour of rebuilding, disk1 thrown huge counts of read errors.
Screenshot and diagnostics attached.

I have preceded this replace/rebuild process with a quick SMART short self test on all drives, all returning 'Complete without error', including on the 7th drive (the original disk3 that I'm replacing, but this is for another topic).

Should i cancel this rebuild now, and if so, what's next for me? Letting the rebuild to finish, would still in the end leave room of doubt of data integrity due to those read errors during rebuild.

I'm at loss what to do.

Thank you!
 

Screenshot_2018-10-07 doubtful rebuild.png

 

Edited by ldrax
Link to comment

Hi johnnie.black,
After watching the rebuild for a while and seeing there was no further error, I decided to let the rebuild finish. And it did so without any increment of the read errors. So what I have now is the new disk3 that's been rebuilt using the parity and combination read of 4 other drives, 1 drive of which churned out those huge read errors count during the rebuild.


I think I'll try to do a checksum comparison or dry-run rsync from the original disk3 (which I believe is still healthy though incorrectly disabled by unRAID).

Link to comment
33 minutes ago, johnnie.black said:

SMART looks fine, though your disks are running way too hot, replace/swap cables and try again, you can also run an extended SMART test, and try to improve cooling.

Yeah these cables are brand new (although that doesn't guarantee anything). Actually what happened was that I decided to get a brand new LSI card complete with 2x4SATA breakout cables. And unraid immediately disabled disk3, and upon slotting in new drive and doing rebuild-sync, disk1 throwing those read errors.  But not sure if it's cable issue now, because for many hours later until the rebuild finished, no further errors showing up.

As for cooling, I'm a bit frustrated with trying to house 6x7200 drives in Lian Li PC-Q25. Short of replacing it with a bigger casing, I'm going to throw last dice in trying a couple of 'jet engine' 3000rpm ippc case fans, as I read from other thread here. I'll see how it goes once they're here.

Link to comment
On 10/8/2018 at 5:29 PM, ldrax said:

As for cooling, I'm a bit frustrated with trying to house 6x7200 drives in Lian Li PC-Q25. Short of replacing it with a bigger casing, I'm going to throw last dice in trying a couple of 'jet engine' 3000rpm ippc case fans, as I read from other thread here. I'll see how it goes once they're here.

Pretty satisfied now with 9-11 C cooler temps after I installed the 3000rpm ippc fans.

 

Link to comment
On 10/8/2018 at 5:49 PM, johnnie.black said:

If the old disk is good it's a option, since there will be some corrupt file(s) on the rebuilt disk.

Just an update, truly enough, checksum on the whole disk reveals a bunch of mismatch, in addition to XFS filesystem corruption. The checksum was performed after xfs_repair.

 

On the read error issue, after some painstaking troubleshooting involving many different solutions, I finally, I think, pinned down the cause to be inadequate power supply.

Edited by ldrax
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.