n1c076 Posted February 17, 2023 Share Posted February 17, 2023 Hi All, I have a parity correct check with 5 errors and the second check with no correct lists the same errors: First correct check Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069768 Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069776 Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069784 Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069792 Feb 16 04:21:08 micronas kernel: md: recovery thread: P corrected, sector=3519069800 second no correct: Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069768 Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069776 Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069784 Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069792 Feb 17 04:00:59 micronas kernel: md: recovery thread: P incorrect, sector=3519069800 I have a disk with smart error, current pending sector: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 3 Spin_Up_Time POS--K 165 164 021 - 6750 4 Start_Stop_Count -O--CK 098 098 000 - 2902 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 001 001 000 - 83169 10 Spin_Retry_Count -O--CK 100 100 000 - 0 11 Calibration_Retry_Count -O--CK 100 100 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 290 192 Power-Off_Retract_Count -O--CK 200 200 000 - 188 193 Load_Cycle_Count -O--CK 001 001 000 - 1858898 194 Temperature_Celsius -O---K 130 102 000 - 20 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 1 198 Offline_Uncorrectable ----CK 200 200 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 83 No other smart error, any advice? Nicola micronas-diagnostics-20230217-1114.zip micronas-diagnostics-20230216-2031.zip Quote Link to comment
JorgeB Posted February 17, 2023 Share Posted February 17, 2023 Mains possibilities would be a disk or a RAM error, if it was RAM the fist time the sectors were wrongly change and now would be changed back, run another correcting check and see if you keep getting more errors on the next ones. Quote Link to comment
n1c076 Posted February 17, 2023 Author Share Posted February 17, 2023 It's a new server and I've done many burn in and memtest cycles without any problem, but ok I'll try another parity check with correction and I'll let you know. Thank you Nicola Quote Link to comment
n1c076 Posted February 18, 2023 Author Share Posted February 18, 2023 same 5 sector: Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069768 Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069776 Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069784 Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069792 Feb 18 05:57:18 micronas kernel: md: recovery thread: P corrected, sector=3519069800 what do you think Jorge? It's the parity hdd or another one? Quote Link to comment
JorgeB Posted February 18, 2023 Share Posted February 18, 2023 These were expected since the previous check was non correct, now see if you get more on future checks. Quote Link to comment
n1c076 Posted February 19, 2023 Author Share Posted February 19, 2023 Ok, I will add some data to the array and start a new parity check without correction, I'll let you know 👍 Quote Link to comment
n1c076 Posted February 19, 2023 Author Share Posted February 19, 2023 In any case, this new build is giving me some thoughts.. some random ata error o sata link reset, frequently when spinning up, I fear this cheap data controller and its sata cables was not a good idea https://amzn.eu/d/dIIvyws Quote Link to comment
Solution JorgeB Posted February 19, 2023 Solution Share Posted February 19, 2023 Controllers with a SATA port multiplier are not recommended in general, but that by itself should not cause syn errors, unless it's a faulty controller, they are usually no good for performance and reliability issues. Quote Link to comment
Decto Posted February 19, 2023 Share Posted February 19, 2023 (edited) 5 hours ago, n1c076 said: In any case, this new build is giving me some thoughts.. some random ata error o sata link reset, frequently when spinning up, I fear this cheap data controller and its sata cables was not a good idea https://amzn.eu/d/dIIvyws The ASM1166 controller itself is fine, it's just that card you linked has port multipliers as well. The ASM1166 is PCI-E X2 electical (so best in at least a X4 slot) and natively supports 6 drives so one of the alternatives listings (6 port PCI-E x4) would be fine. Edited February 19, 2023 by Decto Quote Link to comment
n1c076 Posted February 19, 2023 Author Share Posted February 19, 2023 ok apparently I found how to spend my free time in the next few days.. I have to understand if I can keep this controller in my build and for my use case or if it's better to change it.. for sure it didn't was a good beginning.. 😅 Quote Link to comment
n1c076 Posted February 19, 2023 Author Share Posted February 19, 2023 ps thank you very much Jorge and Decto! 👍 Quote Link to comment
Pixelshading Posted February 20, 2023 Share Posted February 20, 2023 (edited) Hey there, It seems like i have the same problem. This is also a new unRAID build but i already move about 12 TB of data to it (its just a backup server) But in my case im using the onboard SATA Controller Currently using 4 x 8 TB Seagate IronWolf Drives and 2 x 4 TB Seagate IronWolf Drives First Parity Check with Corretion: Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151176 Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151184 Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151192 Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151200 Feb 19 16:18:30 Tower kernel: md: recovery thread: P corrected, sector=2743151208 Feb 19 17:06:26 Tower kernel: md: recovery thread: P corrected, sector=3907018616 Feb 19 21:07:56 Tower kernel: md: recovery thread: P corrected, sector=8589960632 Feb 19 21:07:58 Tower kernel: md: recovery thread: P corrected, sector=8590443896 Second Check without Corretion this going at this point (currently its at 23,6% done with the second check): Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151176 Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151184 Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151192 Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151200 Feb 20 17:31:52 Tower kernel: md: recovery thread: P incorrect, sector=2743151208 As you can see the sectors are identical. The interesting Part is that i also build a second system with slightly different hardware (this one uses 4 x 8 TB Seagate IronWolf Drives and an intel CPU instead of an AMD) and the same error occurred on those discs aswell in the same exact sector. First Parity Check with Corretion: Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151176 Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151184 Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151192 Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151200 Feb 20 00:44:53 Phoenix kernel: md: recovery thread: P corrected, sector=2743151208 However when i first build the second system with the 4 x 8 TB Seagate Drives i was using the Mainboard, CPU and RAM from the other system. Maybe thats a hint? Right now im also running a second parity check on the second system about 32,1 % done so far so good. *Edit* Oh and all discs are fine no smart errors Edited February 20, 2023 by Pixelshading Quote Link to comment
n1c076 Posted February 21, 2023 Author Share Posted February 21, 2023 interesting, we have in common an hw migration while keeping the array..Inviato dal mio iPhone utilizzando Tapatalk Quote Link to comment
n1c076 Posted February 22, 2023 Author Share Posted February 22, 2023 just ordered two asm1166 6 ports to replace the 10 ports one with multiplier 🤞 Quote Link to comment
n1c076 Posted February 24, 2023 Author Share Posted February 24, 2023 the new controllers solved the problem, no more errors of any kind, another confirmation that multipliers are evil thank you for all! 👍 1 Quote Link to comment
n1c076 Posted February 24, 2023 Author Share Posted February 24, 2023 ps I recommend everyone with an asm1166 controller to update the firmware as explained here: https://docs.phil-barker.com/posts/upgrading-ASM1166-firmware-for-unraid/ the performance of the controller improves significantly Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.