Beta Posted March 29, 2023 Share Posted March 29, 2023 Hi! I'm in the midst of a read/write heavy procedure unpacking about 25 TB of rared content with unpackerr, throwing a few hundred gigs at it at a time. During last night I got a push from unraid saying Reported uncorrect on one drive increased from 0 to 1 in the middle of the current batch of unpacking. When done unraid indicated 32 errors on the drive (I assume these are corrected?) I then proceeded to run a short smart test followed by an extended smart test. During the extended smart test, reported uncorrect increased to 3. However, the test seems to say that the drive passed. See test results below: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 37799 - # 2 Short offline Completed without error 00% 37786 - After the extended smart test unraid now indicates 96 errors on the drive: Reported uncorrect now sits at 3 after the extended test: 187 Reported uncorrect 0x0032 097 097 000 Old age Always Never 3 I've attached my diagnostics. (Disk 6 is the disk with the errors) Is it safe to acknowledge the error and keep using the disk but keep an eye out for increasing values and other smart errors? Or should I be looking at replacing the drive ASAP? Thanks! unraid-diagnostics-20230329-2137.zip Quote Link to comment
JonathanM Posted March 30, 2023 Share Posted March 30, 2023 4 hours ago, Beta said: safe to acknowledge the error and keep using the disk but keep an eye out for increasing values and other smart errors This. If all is quiet for a significant period of time, you can relax a little. If the errors keep happening regularly, I'd replace it. Regardless, the drive is now officially on your watch list. 4 hours ago, Beta said: When done unraid indicated 32 errors on the drive (I assume these are corrected?) Yes, when the drive returned a read error, Unraid read the rest of the drives and calculated from parity the bits that were supposed to be there, and wrote the calculated values back to the drive, which the drive acknowledged a successful write, so the drive is deemed still fit for use and not disabled. Unraid will continue to use a drive until a write fails, but that doesn't mean the drive is healthy, that's up to you to monitor and make a judgment call. Just as an aside, if you don't need your ftp available 24/7/365, you might consider shutting it down when not in use. All the hack attempts make reading the logs irritating. Quote Link to comment
Beta Posted March 30, 2023 Author Share Posted March 30, 2023 6 hours ago, JonathanM said: This. If all is quiet for a significant period of time, you can relax a little. If the errors keep happening regularly, I'd replace it. Regardless, the drive is now officially on your watch list. Yes, when the drive returned a read error, Unraid read the rest of the drives and calculated from parity the bits that were supposed to be there, and wrote the calculated values back to the drive, which the drive acknowledged a successful write, so the drive is deemed still fit for use and not disabled. Unraid will continue to use a drive until a write fails, but that doesn't mean the drive is healthy, that's up to you to monitor and make a judgment call. Just as an aside, if you don't need your ftp available 24/7/365, you might consider shutting it down when not in use. All the hack attempts make reading the logs irritating. Awesome, I thought as much, but nice having confirmation from someone more knowledgeable! I'll acknowledge the errors and keep my eyes on it for further errors for now. Thanks for the hint on the ftp-server. I was going to disable it anyway soon, barely used anymore. Quote Link to comment
Beta Posted March 30, 2023 Author Share Posted March 30, 2023 Ooops more SMART errors popped up this evening. Time to order a replacement? Running another extended test right now. 197 Current pending sector 0x0012 100 100 000 Old age Always Never 8 198 Offline uncorrectable 0x0010 100 100 000 Old age Offline Never 8 Quote Link to comment
JonathanM Posted March 31, 2023 Share Posted March 31, 2023 4 hours ago, Beta said: Time to order a replacement? It would be prudent to have the replacement ready. Do you trust the health of the rest of your drives? Quote Link to comment
Beta Posted March 31, 2023 Author Share Posted March 31, 2023 (edited) 3 hours ago, JonathanM said: It would be prudent to have the replacement ready. Do you trust the health of the rest of your drives? Values increased during the extended test 187 Reported uncorrect 0x0032 097 097 000 Old age Always Never 3 197 Current pending sector 0x0012 100 100 000 Old age Always Never 16 198 Offline uncorrectable 0x0010 100 100 000 Old age Offline Never 16 And extended smart test failed Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 10% 37833 - # 2 Short offline Completed without error 00% 37825 - # 3 Extended offline Completed without error 00% 37799 - # 4 Short offline Completed without error 00% 37786 - Ordering a replacement disk now, I have not seen any issues with any other drives yet. Most of them are rather new except for two WD Black 2 TB with 7 years power on which I have been waiting to fail.. Halting all read/write heavy tasks until it's replaced. Edited March 31, 2023 by Beta Quote Link to comment
Solution JonathanM Posted March 31, 2023 Solution Share Posted March 31, 2023 6 hours ago, Beta said: extended smart test failed Expedite replacement, don't bother with extensive testing of the new incoming drive, the rebuild process followed by a non-correcting parity check and long smart test will be a trial by fire for the new drive. 6 hours ago, Beta said: two WD Black 2 TB with 7 years power on which I have been waiting to fail.. Might be a good idea to order another replacement to have on hand, I personally keep a tested drive same size as parity in a box as an on deck option. 6 hours ago, Beta said: Halting all read/write heavy tasks until it's replaced. Good reason to consider keeping a tested cold spare to limit time at risk. Quote Link to comment
Beta Posted March 31, 2023 Author Share Posted March 31, 2023 9 hours ago, JonathanM said: Expedite replacement, don't bother with extensive testing of the new incoming drive, the rebuild process followed by a non-correcting parity check and long smart test will be a trial by fire for the new drive. Might be a good idea to order another replacement to have on hand, I personally keep a tested drive same size as parity in a box as an on deck option. Good reason to consider keeping a tested cold spare to limit time at risk. Thanks for the help Jonathan! Didn't want to risk running the array with the disk over the weekend, so went to my local electronics chain store and purchased a replacement. Was going to order a Ironwolf 8TB online with next day delivery (monday cus wekends), was €20 cheaper but ¯\_(ツ)_/¯ Running rebuild now! Quote Link to comment
JonathanM Posted March 31, 2023 Share Posted March 31, 2023 31 minutes ago, Beta said: Running rebuild now! Good deal. I know you aren't expecting the 2 old WD's to die immediately, but running single parity means you have no margin for error when it comes to drive replacement. Running with any single drive that you can't trust through a rebuild means you are just asking for a sudden unannounced failure by a drive you had no clue was on the way out. All drives fail eventually, the tricky part is predicting when. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.