calmDown Posted May 5, 2020 Posted May 5, 2020 Hello everyone. My unRaid sever has started to report parity errors. The first errors came on 2020-02-02 (1514 errors), then no errors until 2020-02-23 (4 errors). Since 2020-03-22 I have been getting errors frequently as seen in the screenshots. There might have been an unclean shutdown before 2020-03-22, but since then I have restarted the server a few times and not had unclean shutdowns. I got 626.219 errors on 2020-05-03, quite a lot I think, ran the check again and got 3.643 errors (2020-05-04). I have attached the diagnostics. Thanks in advance. tower-diagnostics-20200505-2319.zip Quote
trurl Posted May 6, 2020 Posted May 6, 2020 On mobile now so can't look at Diagnostics yet. Have you done memtest? Quote
JorgeB Posted May 6, 2020 Posted May 6, 2020 Do a couple of consecutive parity checks without rebooting and post new diags, but first you need to fix this error spamming the log (an then reboot): Apr 28 04:43:38 Tower nginx: 2020/04/28 04:43:38 [error] 3684#3684: *1298377 connect() to unix:/var/tmp/HomeAssistantCore.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.157, server: , request: "GET /dockerterminal/HomeAssistantCore/token HTTP/1.1", upstream: "http://unix:/var/tmp/HomeAssistantCore.sock:/token", host: "tower", referrer: "http://tower/dockerterminal/HomeAssistantCore/" Apr 28 04:44:28 Tower nginx: 2020/04/28 04:44:28 [error] 3684#3684: *1298470 connect() to unix:/var/tmp/HomeAssistantCore.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.157, server: , request: "GET /dockerterminal/HomeAssistantCore/ws HTTP/1.1", upstream: "http://unix:/var/tmp/HomeAssistantCore.sock:/ws", host: "tower" Quote
calmDown Posted May 6, 2020 Author Posted May 6, 2020 16 hours ago, johnnie.black said: Do a couple of consecutive parity checks without rebooting and post new diags, but first you need to fix this error spamming the log (an then reboot): Apr 28 04:43:38 Tower nginx: 2020/04/28 04:43:38 [error] 3684#3684: *1298377 connect() to unix:/var/tmp/HomeAssistantCore.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.157, server: , request: "GET /dockerterminal/HomeAssistantCore/token HTTP/1.1", upstream: "http://unix:/var/tmp/HomeAssistantCore.sock:/token", host: "tower", referrer: "http://tower/dockerterminal/HomeAssistantCore/" Apr 28 04:44:28 Tower nginx: 2020/04/28 04:44:28 [error] 3684#3684: *1298470 connect() to unix:/var/tmp/HomeAssistantCore.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.157, server: , request: "GET /dockerterminal/HomeAssistantCore/ws HTTP/1.1", upstream: "http://unix:/var/tmp/HomeAssistantCore.sock:/ws", host: "tower" This was only spamming the logs on Apr 28. So there should be two consecutive parity checks after the spam on Mar 3 and 4. But if that isn't correct should I start with rebooting and doing two parity checks or run a memtest? Quote
JorgeB Posted May 7, 2020 Posted May 7, 2020 Because of the spam the log goes from: May 3 07:08:39 Tower kernel: ata2.00: status: { DRDY } May 3 07:08:39 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED May 3 07:08:39 Tower kernel: ata2.00: cmd 60/40:88:c8:b6:5e/00:00:3d:00:00/40 tag 17 ncq dma 32768 in May 3 07:08:39 Tower kernel: res 40/00:58:c8:9c:5e/00 to May 4 04:40:14 Tower rsyslogd: [origin software="rsyslogd" swVersion="8.1908.0" x-pid="1416" x-info="https://www.rsyslog.com"] rsyslogd was HUPed May 4 05:00:01 Tower crond[1613]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null And it's missing most of the first check, not showing any corrections, it does show a lot of errors on ATA2 (disk1) so also replace cables on that disk and do the 2 consecutive checks. Quote
calmDown Posted May 9, 2020 Author Posted May 9, 2020 On 5/7/2020 at 7:53 AM, johnnie.black said: And it's missing most of the first check, not showing any corrections, it does show a lot of errors on ATA2 (disk1) so also replace cables on that disk and do the 2 consecutive checks. Hi again. I repleced the cables for disk 1 and ran two checks, getting 0 errors both times but I got UDMA CRC error count smart errors on three drives (parity, disk1 and disk3). I have gotten these errors before but not for a few months. Any ideas what could cause this? Quote
trurl Posted May 9, 2020 Posted May 9, 2020 8 minutes ago, calmDown said: UDMA CRC error count smart errors on three drives (parity, disk1 and disk3). I have gotten these errors before but not for a few months. Any ideas what could cause this? Usually caused by bad connections or cables. It basically means the data became corrupted between the disk and the rest of the system. New diagnostics might give some clue assuming you haven't rebooted. Quote
calmDown Posted May 9, 2020 Author Posted May 9, 2020 46 minutes ago, trurl said: Usually caused by bad connections or cables. It basically means the data became corrupted between the disk and the rest of the system. New diagnostics might give some clue assuming you haven't rebooted. tower-diagnostics-20200509-1304.zip I have all my drives connected to the motherboard, could the motherboard be the problem? Quote
JorgeB Posted May 9, 2020 Posted May 9, 2020 9 minutes ago, calmDown said: I have all my drives connected to the motherboard, could the motherboard be the problem? It could be, you're still getting ATA errors on 3 disks, if replacing the SATA cables doesn't fix them it could be, try replacing those first, make sure they are good quality cables. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.