Jump to content

Parity Disk Error - Disk Toast?


Recommended Posts

Posted

My monthly parity check just ran, and came up with some errors for my Parity Disk 2:

 

parity_check.thumb.png.7d7b085145e989236c9250797dc5230b.png

 

Self-test history:

self_test.png.5bed674db606a3ac213d094f2ace3805.png

 

SMART report:

smart_report.txt

 

A bit of a background. I had this disk (we'll call it "P2" for short) previously in QNAP 4-Bay NAS. The QNAP flagged P2 as bad, so I replaced it with a new disk. I then ran P2 through chkdsk /r twice, and it didn't find any errors. So when I built the Unraid box, I put all 5 disks into service. So P2 already was kind of suspect.

 

I guess what I'm asking is, is P2 toast? Should I just replace it?

Posted
2 hours ago, johnnie.black said:

SMART test failed so that drive needs to be replaced.

Already swapped in a spare. Running parity rebuild now.

 

Also running preclear on "P2" just to see what happens.

Posted
5 hours ago, johnnie.black said:

A full disk write might fix it for now, but once a disk has issues it's more likely to have more in the near future.

Agreed. This is more for academic curiosity. Once the preclear finishes, I plan to physically destroy the platters :)

Posted
On 6/2/2020 at 4:59 AM, johnnie.black said:

A full disk write might fix it for now, but once a disk has issues it's more likely to have more in the near future.

So the preclear finally finished... with some truly bizarre results.

 

It took about 7 hours for the first 90% of the Pre-read. It then took another 17 hours for the last 10% to finish, with massive amount of read errors. By the time Pre-read finished, the disk had 2167 Current Pending Sectors. However, by the time Zeroing finished, Current Pending Sectors went down to 0. My expectation is that the Offline Uncorrectable would increase, but it's still at 0. Post-read verification also finished without any issues.

 

I'm truly confused by what this means. How can I have so many read errors during Pre-read verification, but after zeroing out the disk, everything looks fine? Can someone more knowledgeable explain this to me?

 

Log below:

############################################################################################################################
#                                                                                                                          #
#                                     unRAID Server Preclear of disk WD-WCC4N1LF9UDJ                                       #
#                                       Cycle 1 of 1, partition start on sector 64.                                        #
#                                                                                                                          #
#                                                                                                                          #
#   Step 1 of 5 - Pre-read verification:                                                   [24:20:38 @ 34 MB/s] SUCCESS    #
#   Step 2 of 5 - Zeroing the disk:                                                        [7:14:19 @ 115 MB/s] SUCCESS    #
#   Step 3 of 5 - Writing unRAID's Preclear signature:                                                          SUCCESS    #
#   Step 4 of 5 - Verifying unRAID's Preclear signature:                                                        SUCCESS    #
#   Step 5 of 5 - Post-Read verification:                                                  [7:34:07 @ 110 MB/s] SUCCESS    #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
############################################################################################################################
#                              Cycle elapsed time: 39:09:07 | Total elapsed time: 39:09:07                                 #
############################################################################################################################


############################################################################################################################
#                                                                                                                          #
#                                        S.M.A.R.T. Status (device type: default)                                          #
#                                                                                                                          #
#                                                                                                                          #
#   ATTRIBUTE                    INITIAL  CYCLE 1  STATUS                                                                  #
#   5-Reallocated_Sector_Ct      0        0        -                                                                       #
#   9-Power_On_Hours             36710    36749    Up 39                                                                   #
#   194-Temperature_Celsius      33       37       Up 4                                                                    #
#   196-Reallocated_Event_Count  0        0        -                                                                       #
#   197-Current_Pending_Sector   0        0        -                                                                       #
#   198-Offline_Uncorrectable    0        0        -                                                                       #
#   199-UDMA_CRC_Error_Count     0        0        -                                                                       #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
############################################################################################################################
#   SMART overall-health self-assessment test result: PASSED                                                               #
############################################################################################################################


--> ATTENTION: Please take a look into the SMART report above for drive health issues.

--> RESULT: Preclear Finished Successfully!.

 

Posted
41 minutes ago, Phoenix Down said:

 

I'm truly confused by what this means. How can I have so many read errors during Pre-read verification, but after zeroing out the disk, everything looks fine? Can someone more knowledgeable explain this to me?

I'm far from an expert on this, but my understanding is that the drive initially couldn't read the contents of those sectors, but after a fresh write, it was then able to read them.

 

I would be extra vigilant with that drive, since you don't have a definite reason WHY this happened. My guess is that there are large regions that just aren't good at keeping bits intact long term. With frequent rewrites of those areas, it may be ok, but stagnant storage may be risky.

 

It could also be marginal power, where the write cycles to those zones previously weren't "forceful" enough. I would think that would have other effects as well, but who knows.

 

Hard drives are analog devices that return binary data. When the analog voltage levels get too close to the margins that define 1 vs. 0, strange things happen.

Posted
9 hours ago, Phoenix Down said:

It took about 7 hours for the first 90% of the Pre-read.

Kind of pointless to do a pre-read when the disk has known bad sectors, just go directly for the write then do a post read, bad sectors are reallocated on writes, not on reads, now and before considering putting that disk to use it would be a good idea to run a couple o complete preclear cycles.

Posted

Thanks for the comments jonathanm and johnnie.black. I did another extended SMART test and that went fine. Started another preclear cycle and so far that's running fine as well. Will run another cycle after this.

 

Not sure that I would trust this HD even if everything came back fine after a couple more preclear cycles.

Posted (edited)

OK, so I've ran a SMART extended test as mentioned, then 3 full cycles of preclear. All completed on time with no errors. All SMART failure indicators are still at 0. Should I put this disk back into service as an array drive? Or parity drive?

Edited by Phoenix Down
Posted
1 hour ago, Phoenix Down said:

Should I put this disk back into service as an array drive? Or parity drive?

It's up to you, very difficult to predict if it will last, also depends on your risk tolerance, if you have backups or not, single/dual parity etc, I would have no problem using it, but I have full backups and dual parity on most of my servers.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...