ldrax Posted October 6, 2018 Share Posted October 6, 2018 (edited) Hi guys, Looks like I hadn't much luck with my unRAID machines lately. Been having a string of different issues on my 2 builds. Long story short, I'm replacing disk3, out of 6 disks array (1 parity + 5 data drives). Within half an hour of rebuilding, disk1 thrown huge counts of read errors. Screenshot and diagnostics attached. I have preceded this replace/rebuild process with a quick SMART short self test on all drives, all returning 'Complete without error', including on the 7th drive (the original disk3 that I'm replacing, but this is for another topic). Should i cancel this rebuild now, and if so, what's next for me? Letting the rebuild to finish, would still in the end leave room of doubt of data integrity due to those read errors during rebuild. I'm at loss what to do. Thank you! Edited October 25, 2018 by ldrax Quote Link to comment
JorgeB Posted October 8, 2018 Share Posted October 8, 2018 SMART looks fine, though your disks are running way too hot, replace/swap cables and try again, you can also run an extended SMART test, and try to improve cooling. Quote Link to comment
ldrax Posted October 8, 2018 Author Share Posted October 8, 2018 Hi johnnie.black, After watching the rebuild for a while and seeing there was no further error, I decided to let the rebuild finish. And it did so without any increment of the read errors. So what I have now is the new disk3 that's been rebuilt using the parity and combination read of 4 other drives, 1 drive of which churned out those huge read errors count during the rebuild. I think I'll try to do a checksum comparison or dry-run rsync from the original disk3 (which I believe is still healthy though incorrectly disabled by unRAID). Quote Link to comment
ldrax Posted October 8, 2018 Author Share Posted October 8, 2018 33 minutes ago, johnnie.black said: SMART looks fine, though your disks are running way too hot, replace/swap cables and try again, you can also run an extended SMART test, and try to improve cooling. Yeah these cables are brand new (although that doesn't guarantee anything). Actually what happened was that I decided to get a brand new LSI card complete with 2x4SATA breakout cables. And unraid immediately disabled disk3, and upon slotting in new drive and doing rebuild-sync, disk1 throwing those read errors. But not sure if it's cable issue now, because for many hours later until the rebuild finished, no further errors showing up. As for cooling, I'm a bit frustrated with trying to house 6x7200 drives in Lian Li PC-Q25. Short of replacing it with a bigger casing, I'm going to throw last dice in trying a couple of 'jet engine' 3000rpm ippc case fans, as I read from other thread here. I'll see how it goes once they're here. Quote Link to comment
JorgeB Posted October 8, 2018 Share Posted October 8, 2018 22 minutes ago, ldrax said: I think I'll try to do a checksum comparison or dry-run rsync from the original disk3 If the old disk is good it's a option, since there will be some corrupt file(s) on the rebuilt disk. Quote Link to comment
ldrax Posted October 12, 2018 Author Share Posted October 12, 2018 On 10/8/2018 at 5:29 PM, ldrax said: As for cooling, I'm a bit frustrated with trying to house 6x7200 drives in Lian Li PC-Q25. Short of replacing it with a bigger casing, I'm going to throw last dice in trying a couple of 'jet engine' 3000rpm ippc case fans, as I read from other thread here. I'll see how it goes once they're here. Pretty satisfied now with 9-11 C cooler temps after I installed the 3000rpm ippc fans. Quote Link to comment
ldrax Posted October 20, 2018 Author Share Posted October 20, 2018 (edited) On 10/8/2018 at 5:49 PM, johnnie.black said: If the old disk is good it's a option, since there will be some corrupt file(s) on the rebuilt disk. Just an update, truly enough, checksum on the whole disk reveals a bunch of mismatch, in addition to XFS filesystem corruption. The checksum was performed after xfs_repair. On the read error issue, after some painstaking troubleshooting involving many different solutions, I finally, I think, pinned down the cause to be inadequate power supply. Edited October 20, 2018 by ldrax Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.