Parity Disk Error - Disk Toast? - General Support

June 1, 20206 yr

My monthly parity check just ran, and came up with some errors for my Parity Disk 2:

Self-test history:

self_test.png.5bed674db606a3ac213d094f2ace3805.png

SMART report:

A bit of a background. I had this disk (we'll call it "P2" for short) previously in QNAP 4-Bay NAS. The QNAP flagged P2 as bad, so I replaced it with a new disk. I then ran P2 through chkdsk /r twice, and it didn't find any errors. So when I built the Unraid box, I put all 5 disks into service. So P2 already was kind of suspect.

I guess what I'm asking is, is P2 toast? Should I just replace it?

Quote

June 1, 20206 yr

Author

Well, not looking good... Parity check slowed to a crawl at 93.4%, with 12 days estimated left to go at 233 KB/s. Parity Disk 2 is now up to 526k ERRORS.

Edited June 1, 20206 yr by Phoenix Down

Quote

June 2, 20206 yr

Author

Well, I stopped the parity check just before Parity Disk 2 hit 800k ERRORS. Looks like it is toast Going to swap it out.

Quote

June 2, 20206 yr

Community Expert

SMART test failed so that drive needs to be replaced.

Quote

June 2, 20206 yr

Author

2 hours ago, johnnie.black said:

SMART test failed so that drive needs to be replaced.

Already swapped in a spare. Running parity rebuild now.

Also running preclear on "P2" just to see what happens.

Quote

June 2, 20206 yr

Community Expert

A full disk write might fix it for now, but once a disk has issues it's more likely to have more in the near future.

Quote

June 2, 20206 yr

Author

5 hours ago, johnnie.black said:

A full disk write might fix it for now, but once a disk has issues it's more likely to have more in the near future.

Agreed. This is more for academic curiosity. Once the preclear finishes, I plan to physically destroy the platters

Quote

June 3, 20206 yr

Author

On 6/2/2020 at 4:59 AM, johnnie.black said:

A full disk write might fix it for now, but once a disk has issues it's more likely to have more in the near future.

So the preclear finally finished... with some truly bizarre results.

It took about 7 hours for the first 90% of the Pre-read. It then took another 17 hours for the last 10% to finish, with massive amount of read errors. By the time Pre-read finished, the disk had 2167 Current Pending Sectors. However, by the time Zeroing finished, Current Pending Sectors went down to 0. My expectation is that the Offline Uncorrectable would increase, but it's still at 0. Post-read verification also finished without any issues.

I'm truly confused by what this means. How can I have so many read errors during Pre-read verification, but after zeroing out the disk, everything looks fine? Can someone more knowledgeable explain this to me?

Log below:

############################################################################################################################
#                                                                                                                          #
#                                     unRAID Server Preclear of disk WD-WCC4N1LF9UDJ                                       #
#                                       Cycle 1 of 1, partition start on sector 64.                                        #
#                                                                                                                          #
#                                                                                                                          #
#   Step 1 of 5 - Pre-read verification:                                                   [24:20:38 @ 34 MB/s] SUCCESS    #
#   Step 2 of 5 - Zeroing the disk:                                                        [7:14:19 @ 115 MB/s] SUCCESS    #
#   Step 3 of 5 - Writing unRAID's Preclear signature:                                                          SUCCESS    #
#   Step 4 of 5 - Verifying unRAID's Preclear signature:                                                        SUCCESS    #
#   Step 5 of 5 - Post-Read verification:                                                  [7:34:07 @ 110 MB/s] SUCCESS    #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
############################################################################################################################
#                              Cycle elapsed time: 39:09:07 | Total elapsed time: 39:09:07                                 #
############################################################################################################################


############################################################################################################################
#                                                                                                                          #
#                                        S.M.A.R.T. Status (device type: default)                                          #
#                                                                                                                          #
#                                                                                                                          #
#   ATTRIBUTE                    INITIAL  CYCLE 1  STATUS                                                                  #
#   5-Reallocated_Sector_Ct      0        0        -                                                                       #
#   9-Power_On_Hours             36710    36749    Up 39                                                                   #
#   194-Temperature_Celsius      33       37       Up 4                                                                    #
#   196-Reallocated_Event_Count  0        0        -                                                                       #
#   197-Current_Pending_Sector   0        0        -                                                                       #
#   198-Offline_Uncorrectable    0        0        -                                                                       #
#   199-UDMA_CRC_Error_Count     0        0        -                                                                       #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
############################################################################################################################
#   SMART overall-health self-assessment test result: PASSED                                                               #
############################################################################################################################


--> ATTENTION: Please take a look into the SMART report above for drive health issues.

--> RESULT: Preclear Finished Successfully!.

Quote

June 3, 20206 yr

41 minutes ago, Phoenix Down said:

I'm truly confused by what this means. How can I have so many read errors during Pre-read verification, but after zeroing out the disk, everything looks fine? Can someone more knowledgeable explain this to me?

I'm far from an expert on this, but my understanding is that the drive initially couldn't read the contents of those sectors, but after a fresh write, it was then able to read them.

I would be extra vigilant with that drive, since you don't have a definite reason WHY this happened. My guess is that there are large regions that just aren't good at keeping bits intact long term. With frequent rewrites of those areas, it may be ok, but stagnant storage may be risky.

It could also be marginal power, where the write cycles to those zones previously weren't "forceful" enough. I would think that would have other effects as well, but who knows.

Hard drives are analog devices that return binary data. When the analog voltage levels get too close to the margins that define 1 vs. 0, strange things happen.

Quote

June 4, 20206 yr

Community Expert

9 hours ago, Phoenix Down said:

It took about 7 hours for the first 90% of the Pre-read.

Kind of pointless to do a pre-read when the disk has known bad sectors, just go directly for the write then do a post read, bad sectors are reallocated on writes, not on reads, now and before considering putting that disk to use it would be a good idea to run a couple o complete preclear cycles.

Quote

June 4, 20206 yr

Author

Thanks for the comments jonathanm and johnnie.black. I did another extended SMART test and that went fine. Started another preclear cycle and so far that's running fine as well. Will run another cycle after this.

Not sure that I would trust this HD even if everything came back fine after a couple more preclear cycles.

Quote

June 7, 20206 yr

Author

OK, so I've ran a SMART extended test as mentioned, then 3 full cycles of preclear. All completed on time with no errors. All SMART failure indicators are still at 0. Should I put this disk back into service as an array drive? Or parity drive?

Edited June 7, 20206 yr by Phoenix Down

Quote

June 7, 20206 yr

Community Expert

1 hour ago, Phoenix Down said:

Should I put this disk back into service as an array drive? Or parity drive?

It's up to you, very difficult to predict if it will last, also depends on your risk tolerance, if you have backups or not, single/dual parity etc, I would have no problem using it, but I have full backups and dual parity on most of my servers.

Quote

Parity Disk Error - Disk Toast?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)