Phoenix Down Posted June 1, 2020 Posted June 1, 2020 My monthly parity check just ran, and came up with some errors for my Parity Disk 2: Self-test history: SMART report: smart_report.txt A bit of a background. I had this disk (we'll call it "P2" for short) previously in QNAP 4-Bay NAS. The QNAP flagged P2 as bad, so I replaced it with a new disk. I then ran P2 through chkdsk /r twice, and it didn't find any errors. So when I built the Unraid box, I put all 5 disks into service. So P2 already was kind of suspect. I guess what I'm asking is, is P2 toast? Should I just replace it? Quote
Phoenix Down Posted June 1, 2020 Author Posted June 1, 2020 (edited) Well, not looking good... Parity check slowed to a crawl at 93.4%, with 12 days estimated left to go at 233 KB/s. Parity Disk 2 is now up to 526k ERRORS. Edited June 1, 2020 by Phoenix Down Quote
Phoenix Down Posted June 2, 2020 Author Posted June 2, 2020 Well, I stopped the parity check just before Parity Disk 2 hit 800k ERRORS. Looks like it is toast Going to swap it out. Quote
JorgeB Posted June 2, 2020 Posted June 2, 2020 SMART test failed so that drive needs to be replaced. Quote
Phoenix Down Posted June 2, 2020 Author Posted June 2, 2020 2 hours ago, johnnie.black said: SMART test failed so that drive needs to be replaced. Already swapped in a spare. Running parity rebuild now. Also running preclear on "P2" just to see what happens. Quote
JorgeB Posted June 2, 2020 Posted June 2, 2020 A full disk write might fix it for now, but once a disk has issues it's more likely to have more in the near future. Quote
Phoenix Down Posted June 2, 2020 Author Posted June 2, 2020 5 hours ago, johnnie.black said: A full disk write might fix it for now, but once a disk has issues it's more likely to have more in the near future. Agreed. This is more for academic curiosity. Once the preclear finishes, I plan to physically destroy the platters Quote
Phoenix Down Posted June 3, 2020 Author Posted June 3, 2020 On 6/2/2020 at 4:59 AM, johnnie.black said: A full disk write might fix it for now, but once a disk has issues it's more likely to have more in the near future. So the preclear finally finished... with some truly bizarre results. It took about 7 hours for the first 90% of the Pre-read. It then took another 17 hours for the last 10% to finish, with massive amount of read errors. By the time Pre-read finished, the disk had 2167 Current Pending Sectors. However, by the time Zeroing finished, Current Pending Sectors went down to 0. My expectation is that the Offline Uncorrectable would increase, but it's still at 0. Post-read verification also finished without any issues. I'm truly confused by what this means. How can I have so many read errors during Pre-read verification, but after zeroing out the disk, everything looks fine? Can someone more knowledgeable explain this to me? Log below: ############################################################################################################################ # # # unRAID Server Preclear of disk WD-WCC4N1LF9UDJ # # Cycle 1 of 1, partition start on sector 64. # # # # # # Step 1 of 5 - Pre-read verification: [24:20:38 @ 34 MB/s] SUCCESS # # Step 2 of 5 - Zeroing the disk: [7:14:19 @ 115 MB/s] SUCCESS # # Step 3 of 5 - Writing unRAID's Preclear signature: SUCCESS # # Step 4 of 5 - Verifying unRAID's Preclear signature: SUCCESS # # Step 5 of 5 - Post-Read verification: [7:34:07 @ 110 MB/s] SUCCESS # # # # # # # # # # # # # # # ############################################################################################################################ # Cycle elapsed time: 39:09:07 | Total elapsed time: 39:09:07 # ############################################################################################################################ ############################################################################################################################ # # # S.M.A.R.T. Status (device type: default) # # # # # # ATTRIBUTE INITIAL CYCLE 1 STATUS # # 5-Reallocated_Sector_Ct 0 0 - # # 9-Power_On_Hours 36710 36749 Up 39 # # 194-Temperature_Celsius 33 37 Up 4 # # 196-Reallocated_Event_Count 0 0 - # # 197-Current_Pending_Sector 0 0 - # # 198-Offline_Uncorrectable 0 0 - # # 199-UDMA_CRC_Error_Count 0 0 - # # # # # # # # # # # ############################################################################################################################ # SMART overall-health self-assessment test result: PASSED # ############################################################################################################################ --> ATTENTION: Please take a look into the SMART report above for drive health issues. --> RESULT: Preclear Finished Successfully!. Quote
JonathanM Posted June 3, 2020 Posted June 3, 2020 41 minutes ago, Phoenix Down said: I'm truly confused by what this means. How can I have so many read errors during Pre-read verification, but after zeroing out the disk, everything looks fine? Can someone more knowledgeable explain this to me? I'm far from an expert on this, but my understanding is that the drive initially couldn't read the contents of those sectors, but after a fresh write, it was then able to read them. I would be extra vigilant with that drive, since you don't have a definite reason WHY this happened. My guess is that there are large regions that just aren't good at keeping bits intact long term. With frequent rewrites of those areas, it may be ok, but stagnant storage may be risky. It could also be marginal power, where the write cycles to those zones previously weren't "forceful" enough. I would think that would have other effects as well, but who knows. Hard drives are analog devices that return binary data. When the analog voltage levels get too close to the margins that define 1 vs. 0, strange things happen. Quote
JorgeB Posted June 4, 2020 Posted June 4, 2020 9 hours ago, Phoenix Down said: It took about 7 hours for the first 90% of the Pre-read. Kind of pointless to do a pre-read when the disk has known bad sectors, just go directly for the write then do a post read, bad sectors are reallocated on writes, not on reads, now and before considering putting that disk to use it would be a good idea to run a couple o complete preclear cycles. Quote
Phoenix Down Posted June 4, 2020 Author Posted June 4, 2020 Thanks for the comments jonathanm and johnnie.black. I did another extended SMART test and that went fine. Started another preclear cycle and so far that's running fine as well. Will run another cycle after this. Not sure that I would trust this HD even if everything came back fine after a couple more preclear cycles. Quote
Phoenix Down Posted June 7, 2020 Author Posted June 7, 2020 (edited) OK, so I've ran a SMART extended test as mentioned, then 3 full cycles of preclear. All completed on time with no errors. All SMART failure indicators are still at 0. Should I put this disk back into service as an array drive? Or parity drive? Edited June 7, 2020 by Phoenix Down Quote
JorgeB Posted June 7, 2020 Posted June 7, 2020 1 hour ago, Phoenix Down said: Should I put this disk back into service as an array drive? Or parity drive? It's up to you, very difficult to predict if it will last, also depends on your risk tolerance, if you have backups or not, single/dual parity etc, I would have no problem using it, but I have full backups and dual parity on most of my servers. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.