Shares disappearing, logs indicate read errors but drives pass SMART tests.

SanderScamper · February 10, 2022

I've been having a very difficult time with Unraid. I migrated from Windows a few months ago and the most recent issue I'm trying to solve is that after a random period, I get errors in various docker containers (like sabnzbd failing to create directories) and when I check the User shares, it says there aren't any. A reboot fixes the issue.

I've run SMART tests and the drives report fine.

My current hypothesis is that when mover is invoked, the controller/sata interface is crashing and taking something with unraid with it. I haven't tested this yet but I was hoping someone could help with the diagnostics because I'm stuck. I can also provide system logs if the diagnostics are insufficient.

tartarus-diagnostics-20220210-1816.zip

itimpi · February 10, 2022

The diagnostics are of limited value as they are just after a reboot. You should set up the syslog server to see if you can capture what is happening when the problem occurs.

SanderScamper · February 10, 2022

I set it up a few days ago, sorry should have included it.

tartarus-diagnostics-20220103-1852.zip tartarus-diagnostics-20220203-0948.zip syslog

JorgeB · February 10, 2022

Disk7 appears to be failing, run an extend SMART test to confirm, there are also read errors on disk1 causing filesystem issues, disk1 looks healthy, issue could be spin down related, run xfs_repair o it and disable spin down for a few days to test.

SanderScamper · February 10, 2022

hi JorgeB, could failing disks be the reason for Unraid to behave the way it has been? I guess I would have expected Unraid to handle disk failure more gracefully. I've had issues running extended SMART tests, they seem to stop at 10%, is there something I'm missing? I'll disable spin down and look into xfs_repair on disk1.

SanderScamper · February 11, 2022

I logged in this morning to see half of the shares missing. I grabbed the diagnostic for it to see if it's helpful. Notably, appdata was missing and that's cache drive only. My cache drive is a very new 1TB NVME m.2.

tartarus-diagnostics-20220211-0757.zip

trurl · February 11, 2022

Why have you put your server on the internet? All these IPs and many more like that are in your syslog

https://www.abuseipdb.com/check/218.92.0.202

https://www.abuseipdb.com/check/112.85.42.81

https://www.abuseipdb.com/check/141.98.11.16

SanderScamper · February 11, 2022

I only enable webgui reverse proxy manually when I want to access the webgui from a desktop computer I can't install wireguard on, usually just for accessing it remotely like today, for like 30 min at a time. The rest of the time the unraid webgui isn't reverse proxied and can only be accessed through wireguard. Docker containers like sabnzbd are reverse proxy'd through Nginx Proxy Manager.

Edited February 11, 2022 by SanderScamper
clarity

JorgeB · February 11, 2022

8 hours ago, SanderScamper said:

could failing disks be the reason for Unraid to behave the way it has been?

Like mentioned disk1 errors are causing filesystem issues, making it go read-only, that will cause some of issues you're seeing.

trurl · February 11, 2022

Disable ftp server

SanderScamper · February 12, 2022

Ok I'll disable the FTP server.

JorgeB: How can I diagnose if this is disk failure as opposed to some sort of controller/SATA issue?

JorgeB · February 12, 2022

On 2/10/2022 at 12:12 PM, JorgeB said:

issue could be spin down related, run xfs_repair o it and disable spin down for a few days to test.

Start with this.

SanderScamper · February 12, 2022

Don't think that fixed it.

tartarus-diagnostics-20220212-1700.zip

JorgeB · February 12, 2022

There are simultaneous issues with multiple disks:

Feb 12 03:42:03 Tartarus kernel: ata5.00: exception Emask 0x10 SAct 0xc080003f SErr 0x90200 action 0xe frozen
Feb 12 03:42:03 Tartarus kernel: ata5.00: irq_stat 0x00400000, PHY RDY changed
Feb 12 03:42:03 Tartarus kernel: ata5: SError: { Persist PHYRdyChg 10B8B }
Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/08:00:68:44:e2/00:00:83:01:00/40 tag 0 ncq dma 4096 out
Feb 12 03:42:03 Tartarus kernel:         res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY }
Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/a8:08:70:44:e2/02:00:83:01:00/40 tag 1 ncq dma 348160 out
Feb 12 03:42:03 Tartarus kernel:         res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY }
Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/c8:10:18:47:e2/01:00:83:01:00/40 tag 2 ncq dma 233472 out
Feb 12 03:42:03 Tartarus kernel:         res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY }
Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/d8:18:e0:48:e2/00:00:83:01:00/40 tag 3 ncq dma 110592 out
Feb 12 03:42:03 Tartarus kernel:         res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY }
Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/08:20:b8:49:e2/00:00:83:01:00/40 tag 4 ncq dma 4096 out
Feb 12 03:42:03 Tartarus kernel:         res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY }
Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/c0:28:70:4c:e2/01:00:83:01:00/40 tag 5 ncq dma 229376 out
Feb 12 03:42:03 Tartarus kernel:         res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY }
Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/08:b8:c0:49:e2/00:00:83:01:00/40 tag 23 ncq dma 4096 out
Feb 12 03:42:03 Tartarus kernel:         res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY }
Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/e0:f0:88:43:e2/00:00:83:01:00/40 tag 30 ncq dma 114688 out
Feb 12 03:42:03 Tartarus kernel:         res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY }
Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/a8:f8:c8:49:e2/02:00:83:01:00/40 tag 31 ncq dma 348160 out
Feb 12 03:42:03 Tartarus kernel:         res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY }
Feb 12 03:42:03 Tartarus kernel: ata5: hard resetting link
Feb 12 03:42:05 Tartarus kernel: ata5: SATA link down (SStatus 0 SControl 310)
Feb 12 03:42:05 Tartarus kernel: ata5: hard resetting link
Feb 12 03:42:06 Tartarus kernel: ata9.00: exception Emask 0x10 SAct 0x400001c0 SErr 0x90200 action 0xe frozen
Feb 12 03:42:06 Tartarus kernel: ata9.00: irq_stat 0x00400000, PHY RDY changed
Feb 12 03:42:06 Tartarus kernel: ata9: SError: { Persist PHYRdyChg 10B8B }
Feb 12 03:42:06 Tartarus kernel: ata9.00: failed command: READ FPDMA QUEUED
Feb 12 03:42:06 Tartarus kernel: ata9.00: cmd 60/00:30:90:8e:5e/01:00:f5:02:00/40 tag 6 ncq dma 131072 in
Feb 12 03:42:06 Tartarus kernel:         res 40/00:00:90:8e:5e/00:00:f5:02:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:06 Tartarus kernel: ata9.00: status: { DRDY }
Feb 12 03:42:06 Tartarus kernel: ata9.00: failed command: READ FPDMA QUEUED
Feb 12 03:42:06 Tartarus kernel: ata9.00: cmd 60/00:38:20:7a:01/01:00:69:04:00/40 tag 7 ncq dma 131072 in
Feb 12 03:42:06 Tartarus kernel:         res 40/00:00:90:8e:5e/00:00:f5:02:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:06 Tartarus kernel: ata9.00: status: { DRDY }
Feb 12 03:42:06 Tartarus kernel: ata9.00: failed command: READ FPDMA QUEUED
Feb 12 03:42:06 Tartarus kernel: ata9.00: cmd 60/00:40:20:7b:01/01:00:69:04:00/40 tag 8 ncq dma 131072 in
Feb 12 03:42:06 Tartarus kernel:         res 40/00:00:90:8e:5e/00:00:f5:02:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:06 Tartarus kernel: ata9.00: status: { DRDY }
Feb 12 03:42:06 Tartarus kernel: ata9.00: failed command: READ FPDMA QUEUED
Feb 12 03:42:06 Tartarus kernel: ata9.00: cmd 60/00:f0:90:8d:5e/01:00:f5:02:00/40 tag 30 ncq dma 131072 in
Feb 12 03:42:06 Tartarus kernel:         res 40/00:00:90:8e:5e/00:00:f5:02:00/40 Emask 0x10 (ATA bus error)
Feb 12 03:42:06 Tartarus kernel: ata9.00: status: { DRDY }
Feb 12 03:42:06 Tartarus kernel: ata9: hard resetting link
Feb 12 03:42:06 Tartarus kernel: ata5: SATA link down (SStatus 0 SControl 310)
Feb 12 03:42:07 Tartarus kernel: ata5: hard resetting link
Feb 12 03:42:08 Tartarus kernel: ata5: SATA link down (SStatus 0 SControl 310)
Feb 12 03:42:08 Tartarus kernel: ata5.00: disabled

This is usually a power/connection problem.

SanderScamper · February 17, 2022

Ok, system rebuilt. I've borrowed (from an excellent person and power user) a LSI 9201-8i and 650w PSU.

All new power connections (and higher watt PSU). All new sata connections to the new HBA.

In addition, I've bought a 10TB red and set it as parity to try and recover the data from the previously identified drives that are failing.

Here's the new diagnostics. I see that there are drive read errors. My understanding is that the parity drive will get what data it can from those drives, then if I replace those drives, it'll rewrite that data to those drives? I understand that there will be data loss due to already having read errors.

tartarus-diagnostics-20220217-1611.zip

JorgeB · February 17, 2022

Disk6 is failing, since parity isn't valid you can do a standard rebuild, you can try to manually copy everything you can from that disk or use for example ddrescue.

trurl · February 17, 2022

You still have these

Feb 17 16:04:15 Tartarus vsftpd[4079]: connect from 218.92.0.202 (218.92.0.202)
Feb 17 16:05:53 Tartarus vsftpd[5009]: connect from 103.180.135.244 (103.180.135.244)
Feb 17 16:06:27 Tartarus vsftpd[5288]: connect from 112.85.42.74 (112.85.42.74)

https://www.abuseipdb.com/check/218.92.0.202

https://www.abuseipdb.com/check/103.180.135.244

https://www.abuseipdb.com/check/112.85.42.74

You need to secure your server from outside access

SanderScamper · February 18, 2022

What do you recommend I do trurl? I have reverse proxy set up but it's only routing to sabnzbd, sonarr, etc. I don't have the webgui remotely accessible. Are these showing up because I don't have the reverse proxy set as bridge?

Shares disappearing, logs indicate read errors but drives pass SMART tests.

Recommended Posts

SanderScamper

Link to comment

itimpi

Link to comment

SanderScamper

Link to comment

JorgeB

Link to comment

SanderScamper

Link to comment

SanderScamper

Link to comment

trurl

Link to comment

SanderScamper

Link to comment

JorgeB

Link to comment

trurl

Link to comment

SanderScamper

Link to comment

JorgeB

Link to comment

SanderScamper

Link to comment

JorgeB

Link to comment

SanderScamper

Link to comment

JorgeB

Link to comment

trurl

Link to comment

SanderScamper

Link to comment

Join the conversation