parity errors, same sectors after different checks

n1c076 · February 17, 2023

Hi All,

I have a parity correct check with 5 errors and the second check with no correct lists the same errors:

First correct check

Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069768
Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069776
Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069784
Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069792
Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069800

second no correct:

Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069768
Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069776
Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069784
Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069792
Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069800

I have a disk with smart error, current pending sector:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 165 164 021 - 6750
4 Start_Stop_Count -O--CK 098 098 000 - 2902
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 001 001 000 - 83169
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 290
192 Power-Off_Retract_Count -O--CK 200 200 000 - 188
193 Load_Cycle_Count -O--CK 001 001 000 - 1858898
194 Temperature_Celsius -O---K 130 102 000 - 20
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 1
198 Offline_Uncorrectable ----CK 200 200 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 83

No other smart error, any advice?

Nicola

micronas-diagnostics-20230217-1114.zip micronas-diagnostics-20230216-2031.zip

JorgeB · February 17, 2023

Mains possibilities would be a disk or a RAM error, if it was RAM the fist time the sectors were wrongly change and now would be changed back, run another correcting check and see if you keep getting more errors on the next ones.

n1c076 · February 17, 2023

It's a new server and I've done many burn in and memtest cycles without any problem, but ok I'll try another parity check with correction and I'll let you know.

Thank you

Nicola

n1c076 · February 18, 2023

same 5 sector:

Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069768
Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069776
Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069784
Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069792
Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069800

what do you think Jorge? It's the parity hdd or another one?

JorgeB · February 18, 2023

These were expected since the previous check was non correct, now see if you get more on future checks.

n1c076 · February 19, 2023

Ok, I will add some data to the array and start a new parity check without correction, I'll let you know 👍

n1c076 · February 19, 2023

In any case, this new build is giving me some thoughts.. some random ata error o sata link reset, frequently when spinning up, I fear this cheap data controller and its sata cables was not a good idea

https://amzn.eu/d/dIIvyws

JorgeB · February 19, 2023

Controllers with a SATA port multiplier are not recommended in general, but that by itself should not cause syn errors, unless it's a faulty controller, they are usually no good for performance and reliability issues.

Decto · February 19, 2023

5 hours ago, n1c076 said:

In any case, this new build is giving me some thoughts.. some random ata error o sata link reset, frequently when spinning up, I fear this cheap data controller and its sata cables was not a good idea

https://amzn.eu/d/dIIvyws

The ASM1166 controller itself is fine, it's just that card you linked has port multipliers as well.

The ASM1166 is PCI-E X2 electical (so best in at least a X4 slot) and natively supports 6 drives so one of the alternatives listings (6 port PCI-E x4) would be fine.

Edited February 19, 2023 by Decto

n1c076 · February 19, 2023

ok apparently I found how to spend my free time in the next few days.. I have to understand if I can keep this controller in my build and for my use case or if it's better to change it.. for sure it didn't was a good beginning.. 😅

n1c076 · February 19, 2023

ps thank you very much Jorge and Decto! 👍

Pixelshading · February 20, 2023

Hey there,

It seems like i have the same problem. This is also a new unRAID build but i already move about 12 TB of data to it (its just a backup server)

But in my case im using the onboard SATA Controller

Currently using 4 x 8 TB Seagate IronWolf Drives and 2 x 4 TB Seagate IronWolf Drives

First Parity Check with Corretion:

Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151176
Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151184
Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151192
Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151200
Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151208
Feb 19 17:06:26 Tower kernel: md: recovery thread: P corrected, sector=3907018616
Feb 19 21:07:56 Tower kernel: md: recovery thread: P corrected, sector=8589960632
Feb 19 21:07:58 Tower kernel: md: recovery thread: P corrected, sector=8590443896

Second Check without Corretion this going at this point (currently its at 23,6% done with the second check):

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151176

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151184

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151192

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151200

Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151208

As you can see the sectors are identical.

The interesting Part is that i also build a second system with slightly different hardware (this one uses 4 x 8 TB Seagate IronWolf Drives and an intel CPU instead of an AMD)

and the same error occurred on those discs aswell in the same exact sector.

First Parity Check with Corretion:

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151176

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151184

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151192

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151200

Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151208

However when i first build the second system with the 4 x 8 TB Seagate Drives i was using the Mainboard, CPU and RAM from the other system.

Maybe thats a hint?

Right now im also running a second parity check on the second system about 32,1 % done so far so good.

*Edit*

Oh and all discs are fine no smart errors

Edited February 20, 2023 by Pixelshading

n1c076 · February 21, 2023

interesting, we have in common an hw migration while keeping the array..

Inviato dal mio iPhone utilizzando Tapatalk

n1c076 · February 22, 2023

just ordered two asm1166 6 ports to replace the 10 ports one with multiplier 🤞

n1c076 · February 24, 2023

the new controllers solved the problem, no more errors of any kind, another confirmation that multipliers are evil

thank you for all! 👍

n1c076 · February 24, 2023

ps I recommend everyone with an asm1166 controller to update the firmware as explained here:

https://docs.phil-barker.com/posts/upgrading-ASM1166-firmware-for-unraid/

the performance of the controller improves significantly

parity errors, same sectors after different checks

Recommended Posts

n1c076

Link to comment

JorgeB

Link to comment

n1c076

Link to comment

n1c076

Link to comment

JorgeB

Link to comment

n1c076

Link to comment

n1c076

Link to comment

JorgeB

Link to comment

Decto

Link to comment

n1c076

Link to comment

n1c076

Link to comment

Pixelshading

Link to comment

n1c076

Link to comment

n1c076

Link to comment

n1c076

Link to comment

n1c076

Link to comment

Join the conversation