parity errors, same sectors after different checks


n1c076
Go to solution Solved by JorgeB,

Recommended Posts

Hi All,

I have a parity correct check with 5 errors and the second check with no correct lists the same errors:

First correct check

Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069768
Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069776
Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069784
Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069792
Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069800

second no correct:

Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069768
Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069776
Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069784
Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069792
Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069800

 

I have a disk with smart error, current pending sector:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   165   164   021    -    6750
  4 Start_Stop_Count        -O--CK   098   098   000    -    2902
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   001   001   000    -    83169
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    290
192 Power-Off_Retract_Count -O--CK   200   200   000    -    188
193 Load_Cycle_Count        -O--CK   001   001   000    -    1858898
194 Temperature_Celsius     -O---K   130   102   000    -    20
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    1
198 Offline_Uncorrectable   ----CK   200   200   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    83

 

No other smart error, any advice? :)

 

Nicola

 

 

micronas-diagnostics-20230217-1114.zip micronas-diagnostics-20230216-2031.zip

Link to comment

same 5 sector:

Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069768
Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069776
Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069784
Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069792
Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069800

 

what do you think Jorge? It's the parity hdd or another one?

 

Link to comment
5 hours ago, n1c076 said:

In any case, this new build is giving me some thoughts.. some random ata error o sata link reset, frequently when spinning up, I fear this cheap data controller and its sata cables was not a good idea

https://amzn.eu/d/dIIvyws

 

 

 

The ASM1166 controller itself is fine, it's just that card you linked has port multipliers as well.

The ASM1166 is PCI-E X2 electical (so best in at least a X4 slot) and natively supports 6 drives so one of the alternatives listings (6 port PCI-E x4) would be fine.

 

 

Edited by Decto
Link to comment

Hey there,

 

It seems like i have the same problem. This is also a new unRAID build but i already move about 12 TB of data to it (its just a backup server)

But in my case im using the onboard SATA Controller

 

Currently using 4 x 8 TB Seagate IronWolf Drives and 2 x 4 TB Seagate IronWolf Drives

 

First Parity Check with Corretion:

Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151176
Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151184
Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151192
Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151200
Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151208
Feb 19 17:06:26 Tower kernel: md: recovery thread: P corrected, sector=3907018616
Feb 19 21:07:56 Tower kernel: md: recovery thread: P corrected, sector=8589960632
Feb 19 21:07:58 Tower kernel: md: recovery thread: P corrected, sector=8590443896

 

Second Check without Corretion this going at this point (currently its at 23,6% done with the second check):

 

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151176

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151184

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151192

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151200

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151208

 

As you can see the sectors are identical.

 

The interesting Part is that i also build a second system with slightly different hardware (this one uses 4 x 8 TB Seagate IronWolf Drives and an intel CPU instead of an AMD)

and the same error occurred on those discs aswell in the same exact sector.

 

First Parity Check with Corretion:

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151176

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151184

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151192

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151200

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151208

 

However when i first build the second system with the 4 x 8 TB Seagate Drives i was using the Mainboard, CPU and RAM from the other system.

 

Maybe thats a hint?

 

Right now im also running a second parity check on the second system about 32,1 % done so far so good.

 

*Edit*

Oh and all discs are fine no smart errors

 

Edited by Pixelshading
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.