J.Nerdy Posted August 4, 2019 Share Posted August 4, 2019 In the process of upgrading storage capacity on the array and swapped a 4tb red for a new 12tb red as parity disk (4 data disks are going to be 10tb). Before rebuilding parity, moved disk through 2 passes of preclear, with no issue. Perhaps anecdotal: read errors are occurring while parity check is running (I do weekly check) and in the middle of preclearing 2 x 10tb reds. Do I need to replace? I have the old 4TB parity disk, which I will rebuild with while waiting for replacement drive. Diagnostics attached nerdyraid-diagnostics-20190804-1654.zip Quote Link to comment
J.Nerdy Posted August 4, 2019 Author Share Posted August 4, 2019 These errors are occurring during parity check: Quote Link to comment
JorgeB Posted August 5, 2019 Share Posted August 5, 2019 16 hours ago, J.Nerdy said: Do I need to replace? Looks like a disk problem, you can run an extended test though these type of read errors can sometimes be intermittent. Quote Link to comment
J.Nerdy Posted August 5, 2019 Author Share Posted August 5, 2019 1 hour ago, johnnie.black said: Looks like a disk problem, you can run an extended test though these type of read errors can sometimes be intermittent. Will run extended. Parity completed without any errors. Could it be a cabling problem? Also, finished preclearing 1st 10tb data disk for replacement. Would rebuilding data with this disk questionable put corruption at high risk? Thanks (will attach results of extended SMART) Quote Link to comment
JorgeB Posted August 5, 2019 Share Posted August 5, 2019 36 minutes ago, J.Nerdy said: Could it be a cabling problem? No, UNC @ LBA are media errors. Quote Link to comment
trurl Posted August 5, 2019 Share Posted August 5, 2019 19 hours ago, J.Nerdy said: parity check is running (I do weekly check) Why so frequently? Quote Link to comment
J.Nerdy Posted August 5, 2019 Author Share Posted August 5, 2019 (edited) 3 hours ago, trurl said: Why so frequently? Honestly, it was configured not knowing any better and paranoia. Understanding how parity operates a little bit better now, and understanding how it thrashes the disk I changed it to monthly. Currently running extended. No further errors have been logged. Will post, once it is completed (on 12tb it is calculating near 20 hrs to complete). Would these errors lead to corruption if I was to replace a data disk, emulate its contents and rebuild? Also, thanks Johnnie for clearing up (I confused UDMA with UNC). Cheers. Edit: I also installed dynamix file integrity (and am looking at Squids checksum plug-in) to start looking at disk | file hashes Edited August 5, 2019 by J.Nerdy additional info Quote Link to comment
JorgeB Posted August 5, 2019 Share Posted August 5, 2019 1 minute ago, J.Nerdy said: Would these errors lead to corruption if I was to replace a data disk They could, if using single parity. Quote Link to comment
J.Nerdy Posted August 5, 2019 Author Share Posted August 5, 2019 1 minute ago, johnnie.black said: They could, if using single parity. I will wait for the results of extended then. If healthy, should I recalculate parity prior to replacing disks? My only concern is that I will be replacing 4 disks (each 4tb with a 10tb) - and running a parity operation after each disk is replaced and date rebuilt on a shaky parity disk seems like I am courting disaster. Quote Link to comment
JorgeB Posted August 5, 2019 Share Posted August 5, 2019 Just keep the old disks until successfully replaced, those type of errors can happen once or twice and then the disk be good for years, or it can happen again tomorrow, very difficult to predict. Quote Link to comment
J.Nerdy Posted August 5, 2019 Author Share Posted August 5, 2019 44 minutes ago, johnnie.black said: Just keep the old disks until successfully replaced, those type of errors can happen once or twice and then the disk be good for years, or it can happen again tomorrow, very difficult to predict. Cheers. So once all four disks have been rebuilt successfully, I can repurpose the disks. Fix common problems is obviously throwing a fail for the array do to the disk errors... can I ignore for the time being (since I want to monitor if new issues arise rather than have a panic attack every 12 hrs from the same failed scan). Quote Link to comment
JorgeB Posted August 5, 2019 Share Posted August 5, 2019 Just now, J.Nerdy said: can I ignore for the time being Yes, or reboot to clear the errors. Quote Link to comment
J.Nerdy Posted August 5, 2019 Author Share Posted August 5, 2019 I would reboot, but don't want to interrupt extended test. Thanks again! Quote Link to comment
J.Nerdy Posted August 6, 2019 Author Share Posted August 6, 2019 @johnnie.black smart extended attached - 8 pending sectors On a drive so short in its life cycle, should I RMA. The error count has not increased WDC_WD120EFAX-68UNTN0_2AGLW2YY-20190806-0601.txt.zip Quote Link to comment
JorgeB Posted August 6, 2019 Share Posted August 6, 2019 SMART test failed: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 10% 329 You should replace it now. Quote Link to comment
J.Nerdy Posted August 6, 2019 Author Share Posted August 6, 2019 46 minutes ago, johnnie.black said: SMART test failed: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 10% 329 You should replace it now. Heard Final question: I am traveling through Friday - should I take array offline? While away it is only going to be serving media. I can continue preclearing with array off-line. Quote Link to comment
JorgeB Posted August 6, 2019 Share Posted August 6, 2019 Should be OK to leave on for reading, avoid writing to the array. Quote Link to comment
Vr2Io Posted August 6, 2019 Share Posted August 6, 2019 This harddisks model will take in monitoring list, with such good temperature environment and dead soon. 20 hours ago, J.Nerdy said: I also installed dynamix file integrity (and am looking at Squids checksum plug-in) to start looking at disk | file hashes Running both in same time, just consider automatic realtime hash or manual kick in hash. Quote Link to comment
J.Nerdy Posted August 6, 2019 Author Share Posted August 6, 2019 1 hour ago, johnnie.black said: Should be OK to leave on for reading, avoid writing to the array. Heard. VMs shutdown. Only Dockers active are: Plexpy | Plex | CrashPlan | Netdata | CAadvisor New disk is already on way and will be waiting for me. 1 hour ago, Benson said: This harddisks model will take in monitoring list, with such good temperature environment and dead soon. Running both in same time, just consider automatic realtime hash or manual kick in hash. Heard, thank you. What do you mean by monitoring list? Quote Link to comment
Vr2Io Posted August 6, 2019 Share Posted August 6, 2019 (edited) 5 minutes ago, J.Nerdy said: What do you mean by monitoring list? Just means keep track on failure report or failure rate, I expect helium drive should more durable. Edited August 6, 2019 by Benson 1 Quote Link to comment
J.Nerdy Posted August 6, 2019 Author Share Posted August 6, 2019 6 minutes ago, Benson said: Just means keep track on failure report or failure rate, I expect helium drive should more durable. Got it, thanks! Only my second WD (of many) to fail prematurely. Bummer. At least I have original parity disk... and, replacement is already in post. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.