SanderScamper Posted February 10, 2022 Share Posted February 10, 2022 I've been having a very difficult time with Unraid. I migrated from Windows a few months ago and the most recent issue I'm trying to solve is that after a random period, I get errors in various docker containers (like sabnzbd failing to create directories) and when I check the User shares, it says there aren't any. A reboot fixes the issue. I've run SMART tests and the drives report fine. My current hypothesis is that when mover is invoked, the controller/sata interface is crashing and taking something with unraid with it. I haven't tested this yet but I was hoping someone could help with the diagnostics because I'm stuck. I can also provide system logs if the diagnostics are insufficient. tartarus-diagnostics-20220210-1816.zip Quote Link to comment
itimpi Posted February 10, 2022 Share Posted February 10, 2022 The diagnostics are of limited value as they are just after a reboot. You should set up the syslog server to see if you can capture what is happening when the problem occurs. Quote Link to comment
SanderScamper Posted February 10, 2022 Author Share Posted February 10, 2022 I set it up a few days ago, sorry should have included it. tartarus-diagnostics-20220103-1852.zip tartarus-diagnostics-20220203-0948.zip syslog Quote Link to comment
JorgeB Posted February 10, 2022 Share Posted February 10, 2022 Disk7 appears to be failing, run an extend SMART test to confirm, there are also read errors on disk1 causing filesystem issues, disk1 looks healthy, issue could be spin down related, run xfs_repair o it and disable spin down for a few days to test. Quote Link to comment
SanderScamper Posted February 10, 2022 Author Share Posted February 10, 2022 hi JorgeB, could failing disks be the reason for Unraid to behave the way it has been? I guess I would have expected Unraid to handle disk failure more gracefully. I've had issues running extended SMART tests, they seem to stop at 10%, is there something I'm missing? I'll disable spin down and look into xfs_repair on disk1. Quote Link to comment
SanderScamper Posted February 11, 2022 Author Share Posted February 11, 2022 I logged in this morning to see half of the shares missing. I grabbed the diagnostic for it to see if it's helpful. Notably, appdata was missing and that's cache drive only. My cache drive is a very new 1TB NVME m.2. tartarus-diagnostics-20220211-0757.zip Quote Link to comment
trurl Posted February 11, 2022 Share Posted February 11, 2022 Why have you put your server on the internet? All these IPs and many more like that are in your syslog https://www.abuseipdb.com/check/218.92.0.202 https://www.abuseipdb.com/check/112.85.42.81 https://www.abuseipdb.com/check/141.98.11.16 Quote Link to comment
SanderScamper Posted February 11, 2022 Author Share Posted February 11, 2022 (edited) I only enable webgui reverse proxy manually when I want to access the webgui from a desktop computer I can't install wireguard on, usually just for accessing it remotely like today, for like 30 min at a time. The rest of the time the unraid webgui isn't reverse proxied and can only be accessed through wireguard. Docker containers like sabnzbd are reverse proxy'd through Nginx Proxy Manager. Edited February 11, 2022 by SanderScamper clarity Quote Link to comment
JorgeB Posted February 11, 2022 Share Posted February 11, 2022 8 hours ago, SanderScamper said: could failing disks be the reason for Unraid to behave the way it has been? Like mentioned disk1 errors are causing filesystem issues, making it go read-only, that will cause some of issues you're seeing. Quote Link to comment
trurl Posted February 11, 2022 Share Posted February 11, 2022 Disable ftp server Quote Link to comment
SanderScamper Posted February 12, 2022 Author Share Posted February 12, 2022 Ok I'll disable the FTP server. JorgeB: How can I diagnose if this is disk failure as opposed to some sort of controller/SATA issue? Quote Link to comment
JorgeB Posted February 12, 2022 Share Posted February 12, 2022 On 2/10/2022 at 12:12 PM, JorgeB said: issue could be spin down related, run xfs_repair o it and disable spin down for a few days to test. Start with this. Quote Link to comment
SanderScamper Posted February 12, 2022 Author Share Posted February 12, 2022 Don't think that fixed it. tartarus-diagnostics-20220212-1700.zip Quote Link to comment
JorgeB Posted February 12, 2022 Share Posted February 12, 2022 There are simultaneous issues with multiple disks: Feb 12 03:42:03 Tartarus kernel: ata5.00: exception Emask 0x10 SAct 0xc080003f SErr 0x90200 action 0xe frozen Feb 12 03:42:03 Tartarus kernel: ata5.00: irq_stat 0x00400000, PHY RDY changed Feb 12 03:42:03 Tartarus kernel: ata5: SError: { Persist PHYRdyChg 10B8B } Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/08:00:68:44:e2/00:00:83:01:00/40 tag 0 ncq dma 4096 out Feb 12 03:42:03 Tartarus kernel: res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY } Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/a8:08:70:44:e2/02:00:83:01:00/40 tag 1 ncq dma 348160 out Feb 12 03:42:03 Tartarus kernel: res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY } Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/c8:10:18:47:e2/01:00:83:01:00/40 tag 2 ncq dma 233472 out Feb 12 03:42:03 Tartarus kernel: res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY } Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/d8:18:e0:48:e2/00:00:83:01:00/40 tag 3 ncq dma 110592 out Feb 12 03:42:03 Tartarus kernel: res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY } Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/08:20:b8:49:e2/00:00:83:01:00/40 tag 4 ncq dma 4096 out Feb 12 03:42:03 Tartarus kernel: res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY } Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/c0:28:70:4c:e2/01:00:83:01:00/40 tag 5 ncq dma 229376 out Feb 12 03:42:03 Tartarus kernel: res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY } Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/08:b8:c0:49:e2/00:00:83:01:00/40 tag 23 ncq dma 4096 out Feb 12 03:42:03 Tartarus kernel: res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY } Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/e0:f0:88:43:e2/00:00:83:01:00/40 tag 30 ncq dma 114688 out Feb 12 03:42:03 Tartarus kernel: res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY } Feb 12 03:42:03 Tartarus kernel: ata5.00: failed command: WRITE FPDMA QUEUED Feb 12 03:42:03 Tartarus kernel: ata5.00: cmd 61/a8:f8:c8:49:e2/02:00:83:01:00/40 tag 31 ncq dma 348160 out Feb 12 03:42:03 Tartarus kernel: res 40/00:00:c8:49:e2/00:00:83:01:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:03 Tartarus kernel: ata5.00: status: { DRDY } Feb 12 03:42:03 Tartarus kernel: ata5: hard resetting link Feb 12 03:42:05 Tartarus kernel: ata5: SATA link down (SStatus 0 SControl 310) Feb 12 03:42:05 Tartarus kernel: ata5: hard resetting link Feb 12 03:42:06 Tartarus kernel: ata9.00: exception Emask 0x10 SAct 0x400001c0 SErr 0x90200 action 0xe frozen Feb 12 03:42:06 Tartarus kernel: ata9.00: irq_stat 0x00400000, PHY RDY changed Feb 12 03:42:06 Tartarus kernel: ata9: SError: { Persist PHYRdyChg 10B8B } Feb 12 03:42:06 Tartarus kernel: ata9.00: failed command: READ FPDMA QUEUED Feb 12 03:42:06 Tartarus kernel: ata9.00: cmd 60/00:30:90:8e:5e/01:00:f5:02:00/40 tag 6 ncq dma 131072 in Feb 12 03:42:06 Tartarus kernel: res 40/00:00:90:8e:5e/00:00:f5:02:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:06 Tartarus kernel: ata9.00: status: { DRDY } Feb 12 03:42:06 Tartarus kernel: ata9.00: failed command: READ FPDMA QUEUED Feb 12 03:42:06 Tartarus kernel: ata9.00: cmd 60/00:38:20:7a:01/01:00:69:04:00/40 tag 7 ncq dma 131072 in Feb 12 03:42:06 Tartarus kernel: res 40/00:00:90:8e:5e/00:00:f5:02:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:06 Tartarus kernel: ata9.00: status: { DRDY } Feb 12 03:42:06 Tartarus kernel: ata9.00: failed command: READ FPDMA QUEUED Feb 12 03:42:06 Tartarus kernel: ata9.00: cmd 60/00:40:20:7b:01/01:00:69:04:00/40 tag 8 ncq dma 131072 in Feb 12 03:42:06 Tartarus kernel: res 40/00:00:90:8e:5e/00:00:f5:02:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:06 Tartarus kernel: ata9.00: status: { DRDY } Feb 12 03:42:06 Tartarus kernel: ata9.00: failed command: READ FPDMA QUEUED Feb 12 03:42:06 Tartarus kernel: ata9.00: cmd 60/00:f0:90:8d:5e/01:00:f5:02:00/40 tag 30 ncq dma 131072 in Feb 12 03:42:06 Tartarus kernel: res 40/00:00:90:8e:5e/00:00:f5:02:00/40 Emask 0x10 (ATA bus error) Feb 12 03:42:06 Tartarus kernel: ata9.00: status: { DRDY } Feb 12 03:42:06 Tartarus kernel: ata9: hard resetting link Feb 12 03:42:06 Tartarus kernel: ata5: SATA link down (SStatus 0 SControl 310) Feb 12 03:42:07 Tartarus kernel: ata5: hard resetting link Feb 12 03:42:08 Tartarus kernel: ata5: SATA link down (SStatus 0 SControl 310) Feb 12 03:42:08 Tartarus kernel: ata5.00: disabled This is usually a power/connection problem. Quote Link to comment
SanderScamper Posted February 17, 2022 Author Share Posted February 17, 2022 Ok, system rebuilt. I've borrowed (from an excellent person and power user) a LSI 9201-8i and 650w PSU. All new power connections (and higher watt PSU). All new sata connections to the new HBA. In addition, I've bought a 10TB red and set it as parity to try and recover the data from the previously identified drives that are failing. Here's the new diagnostics. I see that there are drive read errors. My understanding is that the parity drive will get what data it can from those drives, then if I replace those drives, it'll rewrite that data to those drives? I understand that there will be data loss due to already having read errors. tartarus-diagnostics-20220217-1611.zip Quote Link to comment
JorgeB Posted February 17, 2022 Share Posted February 17, 2022 Disk6 is failing, since parity isn't valid you can do a standard rebuild, you can try to manually copy everything you can from that disk or use for example ddrescue. Quote Link to comment
trurl Posted February 17, 2022 Share Posted February 17, 2022 You still have these Feb 17 16:04:15 Tartarus vsftpd[4079]: connect from 218.92.0.202 (218.92.0.202) Feb 17 16:05:53 Tartarus vsftpd[5009]: connect from 103.180.135.244 (103.180.135.244) Feb 17 16:06:27 Tartarus vsftpd[5288]: connect from 112.85.42.74 (112.85.42.74) https://www.abuseipdb.com/check/218.92.0.202 https://www.abuseipdb.com/check/103.180.135.244 https://www.abuseipdb.com/check/112.85.42.74 You need to secure your server from outside access Quote Link to comment
SanderScamper Posted February 18, 2022 Author Share Posted February 18, 2022 What do you recommend I do trurl? I have reverse proxy set up but it's only routing to sabnzbd, sonarr, etc. I don't have the webgui remotely accessible. Are these showing up because I don't have the reverse proxy set as bridge? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.