Jump to content

Parity check giving multiple errors


Go to solution Solved by JorgeB,

Recommended Posts

Hello, 

 

I need some help with my server. I got some parity errors two weeks ago after an unclean shutdown. 

After reading many posts here I assumed everything should be fine and I just corrected the parity and did the parity check again and got no errors so I assumed everything was okay. 

 

Now during my monthly parity check I am getting more errors, I am up to 11.119 sync errors corrected so far. 

Now I am suspecting that something is wrong that needs investigating and was wondering if you could help me. 

 

What I suspect is that one of my 4 2TB drives in an external USB hard drive enclosure is failing. 

I am unable to run smart tests on them but I suspect they are failing since they are very old and I have used them for a long time. 

I have a hard drive ready to replace any of them if they are failing but for that I would need to know witch drive is the faulting one. 

 

However I have also found some errors in the logs that could be indicating the error but I can't understand what they are telling me:

Aug 1 14:13:35 Heimanas kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20200925/psargs-330)

Aug 1 14:13:35 Heimanas kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT0._GTF due to previous error (AE_NOT_FOUND) (20200925/psparse-529)

Aug 1 14:13:35 Heimanas kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20200925/psargs-330)

Aug 1 14:13:35 Heimanas kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.PRT0._GTF due to previous error (AE_NOT_FOUND) (20200925/psparse-529)

Aug 1 14:18:05 Heimanas kernel: ata1.00: limiting speed to UDMA/33:PIO4

Aug 1 14:18:05 Heimanas kernel: ata1.00: exception Emask 0x50 SAct 0x641000c2 SErr 0x4090800 action 0xe frozen

Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED

Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED

Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED

Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED

Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: READ FPDMA QUEUED

Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: WRITE FPDMA QUEUED

Aug 1 14:18:05 Heimanas kernel: ata1.00: failed command: WRITE FPDMA QUEUED

Aug 1 14:18:05 Heimanas kernel: ata1: hard resetting link

Aug 1 14:18:15 Heimanas kernel: ata1: COMRESET failed (errno=-16)

Aug 1 14:18:15 Heimanas kernel: ata1: hard resetting link

 

I have attached the diagnosis file below, hopefully that helps finding out the issue. 

 

Thanks for the help in advance. 

heimanas-diagnostics-20220801-1418.zip

Link to comment
54 minutes ago, JorgeB said:

confirm the ATA errors are gone

Aug  1 14:18:05 Heimanas kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug  1 14:18:05 Heimanas kernel: ata1.00: cmd 61/c0:f0:e0:9e:1d/02:00:be:00:00/40 tag 30 ncq dma 360448 out
Aug  1 14:18:05 Heimanas kernel:         res 40/00:00:a8:df:4a/00:00:be:01:00/40 Emask 0x50 (ATA bus error)
Aug  1 14:18:05 Heimanas kernel: ata1.00: status: { DRDY }
Aug  1 14:18:05 Heimanas kernel: ata1: hard resetting link
4 minutes ago, kjarri said:

geting the same errors

Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722576
Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722888
Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722896
Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722904
Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722912

All I see in these latest are the parity errors, no reason to expect them to go away until corrected.

Link to comment
3 minutes ago, trurl said:
Aug  1 14:18:05 Heimanas kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Aug  1 14:18:05 Heimanas kernel: ata1.00: cmd 61/c0:f0:e0:9e:1d/02:00:be:00:00/40 tag 30 ncq dma 360448 out
Aug  1 14:18:05 Heimanas kernel:         res 40/00:00:a8:df:4a/00:00:be:01:00/40 Emask 0x50 (ATA bus error)
Aug  1 14:18:05 Heimanas kernel: ata1.00: status: { DRDY }
Aug  1 14:18:05 Heimanas kernel: ata1: hard resetting link
Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722576
Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722888
Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722896
Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722904
Aug  1 15:14:42 Heimanas kernel: md: recovery thread: P incorrect, sector=17722912

All I see in these latest are the parity errors, no reason to expect them to go away until corrected.

Ahh okay, 

I'll let the non correcting parity check run for a little bit to see if the ATA error comes up again. 

 

However, is it not concerning that the incorrect sectors now are not the same sectors that where faulting and corrected before the reboot?

 

Or fx. sector 0 was incorrect before the reboot and after the reboot, even though that sector should have been corrected?

 

5 minutes ago, trurl said:

USB NOT recommended for array and pool disks for many reasons

 

Yes, I got that solution before knowing more of the downfalls of this approach. Am working on increasing the capacity of the system to decommission the drives and the USB enclosure completely. 

Link to comment
7 minutes ago, trurl said:

Memtest lately?

No, I have not run a memtest on this system at all. 

 

However the ATA error has not come up in the last 30 min so hopefully that issue is solved. 

 

Should I memtest right away or do some parity check first?

Link to comment
1 hour ago, JorgeB said:

I would run memtest for a couple of hours at least.

Okay, I have run memtest for around 1 hour now and the first pass completed without any error. 

 

I'll keep on going maybe one more pass but is seems like the memory is not an issue at this point. 

 

If no errors are detected by memtest, should I try a correcting parity check and then another parity check to see if the errors are gone?

Link to comment
2 minutes ago, JorgeB said:

Yes.

Thanks, I'll do that. 

1 minute ago, trurl said:

Without reboot, or at least get diagnostics after each check so they can be compared 

I'll make sure to not reboot the system and grab the diagnostics if I get some errors in the second check. 

It will take at least 24 hours to run each check so there will probably not be any news for a while. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...