Jump to content

Parity/Read-Check Errors


Recommended Posts

Posted

Hello everyone.

My unRaid sever has started to report parity errors. The first errors came on 2020-02-02 (1514 errors), then no errors until 2020-02-23 (4 errors). Since 2020-03-22 I have been getting errors frequently as seen in the screenshots. There might have been an unclean shutdown before 2020-03-22, but since then I have restarted the server a few times and not had unclean shutdowns. I got 626.219 errors on 2020-05-03, quite a lot I think, ran the check again and got 3.643 errors (2020-05-04).

 

I have attached the diagnostics. Thanks in advance.

Screenshot_2020-05-05 Tower Main.png

tower-diagnostics-20200505-2319.zip

Posted

Do a couple of consecutive parity checks without rebooting and post new diags, but first you need to fix this error spamming the log (an then reboot):

 

Apr 28 04:43:38 Tower nginx: 2020/04/28 04:43:38 [error] 3684#3684: *1298377 connect() to unix:/var/tmp/HomeAssistantCore.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.157, server: , request: "GET /dockerterminal/HomeAssistantCore/token HTTP/1.1", upstream: "http://unix:/var/tmp/HomeAssistantCore.sock:/token", host: "tower", referrer: "http://tower/dockerterminal/HomeAssistantCore/"
Apr 28 04:44:28 Tower nginx: 2020/04/28 04:44:28 [error] 3684#3684: *1298470 connect() to unix:/var/tmp/HomeAssistantCore.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.157, server: , request: "GET /dockerterminal/HomeAssistantCore/ws HTTP/1.1", upstream: "http://unix:/var/tmp/HomeAssistantCore.sock:/ws", host: "tower"

 

Posted
16 hours ago, johnnie.black said:

Do a couple of consecutive parity checks without rebooting and post new diags, but first you need to fix this error spamming the log (an then reboot):

 


Apr 28 04:43:38 Tower nginx: 2020/04/28 04:43:38 [error] 3684#3684: *1298377 connect() to unix:/var/tmp/HomeAssistantCore.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.157, server: , request: "GET /dockerterminal/HomeAssistantCore/token HTTP/1.1", upstream: "http://unix:/var/tmp/HomeAssistantCore.sock:/token", host: "tower", referrer: "http://tower/dockerterminal/HomeAssistantCore/"
Apr 28 04:44:28 Tower nginx: 2020/04/28 04:44:28 [error] 3684#3684: *1298470 connect() to unix:/var/tmp/HomeAssistantCore.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.157, server: , request: "GET /dockerterminal/HomeAssistantCore/ws HTTP/1.1", upstream: "http://unix:/var/tmp/HomeAssistantCore.sock:/ws", host: "tower"

 

This was only spamming the logs on Apr 28. So there should be two consecutive parity checks after the spam on Mar 3 and 4.

 

But if that isn't correct should I start with rebooting and doing two parity checks or run a memtest?

Posted

Because of the spam the log goes from:

 

May  3 07:08:39 Tower kernel: ata2.00: status: { DRDY }
May  3 07:08:39 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED
May  3 07:08:39 Tower kernel: ata2.00: cmd 60/40:88:c8:b6:5e/00:00:3d:00:00/40 tag 17 ncq dma 32768 in
May  3 07:08:39 Tower kernel:         res 40/00:58:c8:9c:5e/00

to

May  4 04:40:14 Tower rsyslogd: [origin software="rsyslogd" swVersion="8.1908.0" x-pid="1416" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
May  4 05:00:01 Tower crond[1613]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null

 

And it's missing most of the first check, not showing any corrections, it does show a lot of errors on ATA2 (disk1) so also replace cables on that disk and do the 2 consecutive checks.

Posted
On 5/7/2020 at 7:53 AM, johnnie.black said:

And it's missing most of the first check, not showing any corrections, it does show a lot of errors on ATA2 (disk1) so also replace cables on that disk and do the 2 consecutive checks.

Hi again. I repleced the cables for disk 1 and ran two checks, getting 0 errors both times but I got UDMA CRC error count smart errors on three drives (parity, disk1 and disk3). I have gotten these errors before but not for a few months. Any ideas what could cause this?

Posted
8 minutes ago, calmDown said:

UDMA CRC error count smart errors on three drives (parity, disk1 and disk3). I have gotten these errors before but not for a few months. Any ideas what could cause this?

Usually caused by bad connections or cables. It basically means the data became corrupted between the disk and the rest of the system.

 

New diagnostics might give some clue assuming you haven't rebooted.

Posted
46 minutes ago, trurl said:

Usually caused by bad connections or cables. It basically means the data became corrupted between the disk and the rest of the system.

 

New diagnostics might give some clue assuming you haven't rebooted.

tower-diagnostics-20200509-1304.zip I have all my drives connected to the motherboard, could the motherboard be the problem?

Posted
9 minutes ago, calmDown said:

I have all my drives connected to the motherboard, could the motherboard be the problem?

It could be, you're still getting ATA errors on 3 disks, if replacing the SATA cables doesn't fix them it could be, try replacing those first, make sure they are good quality cables.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...