kjarri Posted August 1, 2022 Share Posted August 1, 2022 Hello, I need some help with my server. I got some parity errors two weeks ago after an unclean shutdown. After reading many posts here I assumed everything should be fine and I just corrected the parity and did the parity check again and got no errors so I assumed everything was okay. Now during my monthly parity check I am getting more errors, I am up to 11.119 sync errors corrected so far. Now I am suspecting that something is wrong that needs investigating and was wondering if you could help me. What I suspect is that one of my 4 2TB drives in an external USB hard drive enclosure is failing. I am unable to run smart tests on them but I suspect they are failing since they are very old and I have used them for a long time. I have a hard drive ready to replace any of them if they are failing but for that I would need to know witch drive is the faulting one. However I have also found some errors in the logs that could be indicating the error but I can't understand what they are telling me: Aug 1 14:13:35 Heimanas kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20200925/psargs-330) Aug 1 14:13:35 Heimanas kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT0._GTF due to previous error (AE_NOT_FOUND) (20200925/psparse-529) Aug 1 14:13:35 Heimanas kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20200925/psargs-330) Aug 1 14:13:35 Heimanas kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT0._GTF due to previous error (AE_NOT_FOUND) (20200925/psparse-529) Aug 1 14:18:05 Heimanas kernel: ata1.00: limiting speed to UDMA/33:PIO4 Aug 1 14:18:05 Heimanas kernel: ata1.00: exception Emask 0x50 SAct 0x641000c2 SErr 0x4090800 action 0xe frozen Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: WRITE FPDMA QUEUED Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: WRITE FPDMA QUEUED Aug 1 14:18:05 Heimanas kernel: ata1: hard resetting link Aug 1 14:18:15 Heimanas kernel: ata1: COMRESET failed (errno=-16) Aug 1 14:18:15 Heimanas kernel: ata1: hard resetting link I have attached the diagnosis file below, hopefully that helps finding out the issue. Thanks for the help in advance. heimanas-diagnostics-20220801-1418.zip Quote Link to comment
Solution JorgeB Posted August 1, 2022 Solution Share Posted August 1, 2022 Looks like a power/connection problem with the parity disk, check/replace cables and try again, look at the syslog to confirm the ATA errors are gone, or post new diags. Quote Link to comment
kjarri Posted August 1, 2022 Author Share Posted August 1, 2022 Hey Jorge, Checked the cables and they all seemed secure. I tried switching the cables around in the system (power from one drive switched with the parity and the sata cable from another switched with the parity) just to test the cables. I seem to be still geting the same errors though. Here is the new diagnosing file: heimanas-diagnostics-20220801-1515.zip Quote Link to comment
trurl Posted August 1, 2022 Share Posted August 1, 2022 54 minutes ago, JorgeB said: confirm the ATA errors are gone Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: WRITE FPDMA QUEUED Aug 1 14:18:05 Heimanas kernel: ata1.00: cmd 61/c0:f0:e0:9e:1d/02:00:be:00:00/40 tag 30 ncq dma 360448 out Aug 1 14:18:05 Heimanas kernel: res 40/00:00:a8:df:4a/00:00:be:01:00/40 Emask 0x50 (ATA bus error) Aug 1 14:18:05 Heimanas kernel: ata1.00: status: { DRDY } Aug 1 14:18:05 Heimanas kernel: ata1: hard resetting link 4 minutes ago, kjarri said: geting the same errors Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722576 Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722888 Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722896 Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722904 Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722912 All I see in these latest are the parity errors, no reason to expect them to go away until corrected. Quote Link to comment
trurl Posted August 1, 2022 Share Posted August 1, 2022 1 hour ago, kjarri said: an external USB hard drive enclosure USB NOT recommended for array and pool disks for many reasons Quote Link to comment
trurl Posted August 1, 2022 Share Posted August 1, 2022 Wouldn't be at all surprised if you get those same errors again, on the multiple disks you have connected USB. That is also the reason you can't get SMART for any of them. Quote Link to comment
kjarri Posted August 1, 2022 Author Share Posted August 1, 2022 3 minutes ago, trurl said: Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: WRITE FPDMA QUEUED Aug 1 14:18:05 Heimanas kernel: ata1.00: cmd 61/c0:f0:e0:9e:1d/02:00:be:00:00/40 tag 30 ncq dma 360448 out Aug 1 14:18:05 Heimanas kernel: res 40/00:00:a8:df:4a/00:00:be:01:00/40 Emask 0x50 (ATA bus error) Aug 1 14:18:05 Heimanas kernel: ata1.00: status: { DRDY } Aug 1 14:18:05 Heimanas kernel: ata1: hard resetting link Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722576 Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722888 Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722896 Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722904 Aug 1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722912 All I see in these latest are the parity errors, no reason to expect them to go away until corrected. Ahh okay, I'll let the non correcting parity check run for a little bit to see if the ATA error comes up again. However, is it not concerning that the incorrect sectors now are not the same sectors that where faulting and corrected before the reboot? Or fx. sector 0 was incorrect before the reboot and after the reboot, even though that sector should have been corrected? 5 minutes ago, trurl said: USB NOT recommended for array and pool disks for many reasons Yes, I got that solution before knowing more of the downfalls of this approach. Am working on increasing the capacity of the system to decommission the drives and the USB enclosure completely. Quote Link to comment
trurl Posted August 1, 2022 Share Posted August 1, 2022 2 minutes ago, kjarri said: is it not concerning that the incorrect sectors now are not the same sectors that where faulting and corrected before the reboot? Memtest lately? Quote Link to comment
kjarri Posted August 1, 2022 Author Share Posted August 1, 2022 7 minutes ago, trurl said: Memtest lately? No, I have not run a memtest on this system at all. However the ATA error has not come up in the last 30 min so hopefully that issue is solved. Should I memtest right away or do some parity check first? Quote Link to comment
JorgeB Posted August 1, 2022 Share Posted August 1, 2022 I would run memtest for a couple of hours at least. Quote Link to comment
kjarri Posted August 1, 2022 Author Share Posted August 1, 2022 1 hour ago, JorgeB said: I would run memtest for a couple of hours at least. Okay, I have run memtest for around 1 hour now and the first pass completed without any error. I'll keep on going maybe one more pass but is seems like the memory is not an issue at this point. If no errors are detected by memtest, should I try a correcting parity check and then another parity check to see if the errors are gone? Quote Link to comment
JorgeB Posted August 1, 2022 Share Posted August 1, 2022 2 minutes ago, kjarri said: If no errors are detected by memtest, should I try a correcting parity check and then another parity check to see if the errors are gone? Yes. Quote Link to comment
trurl Posted August 1, 2022 Share Posted August 1, 2022 Without reboot, or at least get diagnostics after each check so they can be compared Quote Link to comment
kjarri Posted August 1, 2022 Author Share Posted August 1, 2022 2 minutes ago, JorgeB said: Yes. Thanks, I'll do that. 1 minute ago, trurl said: Without reboot, or at least get diagnostics after each check so they can be compared I'll make sure to not reboot the system and grab the diagnostics if I get some errors in the second check. It will take at least 24 hours to run each check so there will probably not be any news for a while. Quote Link to comment
kjarri Posted August 3, 2022 Author Share Posted August 3, 2022 After running two parity checks the second one came up with no errors. So the issue was likely some faulty connection with the cables and then just needing a parity check to fix the issues left. Here is the diagnostic report if you want to verify or if you find any other issue i'm missing. heimanas-diagnostics-20220803-1722.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.