mrvnsk9 Posted January 5, 2020 Share Posted January 5, 2020 Over the last few days my server has starting hanging after being up for a few hours. I looked at the SMART reports in the diagnostics and it looks like there are errors on disks 3 and 6. I'm not sure if this is the problem or if there is another cause for the lockups. I mirrored the syslog server to flash before the last freeze happened. Both it and the diagnostics are attached. Any help would be greatly appreciated. Thanks in advance! dragon-diagnostics-20200104-1903.zip syslog Quote Link to comment
trurl Posted January 5, 2020 Share Posted January 5, 2020 Array not started in those so can't tell anything about filesystems or shares, though those wouldn't be expected to cause a crash. Does it in fact crash even if you never start the array? Quote Link to comment
mrvnsk9 Posted January 5, 2020 Author Share Posted January 5, 2020 Apologies. I forgot my array wasn't started when I pulled the first diagnostics. Of course that was the only time I pulled any diagnostics. It freezes when the array is started. It hasn't frozen with the array stopped. I attached a new diagnostics with the array started. Thanks dragon-diagnostics-20200104-2108.zip Quote Link to comment
trurl Posted January 5, 2020 Share Posted January 5, 2020 I don't notice anything in those, except it looks like you are using 11G of 30G docker image with no dockers running. Have you had problems filling docker image? Have you done memtest? Quote Link to comment
mrvnsk9 Posted January 5, 2020 Author Share Posted January 5, 2020 I haven't had issues with the docker image filling up. I guess the dockers hadn't finished starting when I pulled the diagnostics. I haven't done a memtest yet. I'll try that. Should I be concerned about the SMART errors on disks 3 and 6? Quote Link to comment
mrvnsk9 Posted January 5, 2020 Author Share Posted January 5, 2020 I ran a memtest and it passed with no errors. Quote Link to comment
trurl Posted January 5, 2020 Share Posted January 5, 2020 Doesn't seem like you have actually captured a syslog after a crash since the one you attached was basically the same as that in the diagnostics, and the array wasn't started. Quote Link to comment
mrvnsk9 Posted January 5, 2020 Author Share Posted January 5, 2020 The log was still mirrored after the restart. If you scroll up to line 213 in the syslog file you should see a timestamp of "Jan 4 17:47:16". This is where the server became unresponsive. Quote Link to comment
JorgeB Posted January 5, 2020 Share Posted January 5, 2020 Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates. More info here: https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/ Quote Link to comment
mrvnsk9 Posted January 5, 2020 Author Share Posted January 5, 2020 (edited) The bios is up to date and C-states are already disabled. The "Power Supply Idle Control" was not set to the suggested value. I changed that. The odd thing is the server has been stable for a year and didn't start having issues until Jan 1. Probably a coincidence, but it's still odd to me. Is the "/usr/local/sbin/zenstates --c6-disable" line still required in the go file or is it no longer needed? Also, I'm using "rcu_nocbs=0-7" in the syslinux configuration. Edited January 5, 2020 by mrvnsk9 Quote Link to comment
JorgeB Posted January 6, 2020 Share Posted January 6, 2020 12 hours ago, mrvnsk9 said: Is the "/usr/local/sbin/zenstates --c6-disable" line still required in the go file or is it no longer needed? It should no longer be needed with the power supply idle control correctly set. Quote Link to comment
mrvnsk9 Posted January 7, 2020 Author Share Posted January 7, 2020 (edited) @johnnie.black After making the changes to the bios, the server stayed up for about 15 hours. I was using the unbalance plugin to move some files to disk6 and received the following errors before it locked up. Jan 6 23:33:48 Dragon kernel: ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen Jan 6 23:33:48 Dragon kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata1.00: cmd 61/70:00:08:9d:e0/01:00:be:00:00/40 tag 0 ncq dma 188416 out Jan 6 23:33:48 Dragon kernel: res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata1.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata1.00: cmd 61/40:08:90:7d:e0/05:00:be:00:00/40 tag 1 ncq dma 688128 out Jan 6 23:33:48 Dragon kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata1.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata1: hard resetting link Jan 6 23:33:48 Dragon kernel: ata3.00: exception Emask 0x0 SAct 0x780 SErr 0x0 action 0x6 frozen Jan 6 23:33:48 Dragon kernel: ata3.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata3.00: cmd 60/40:38:78:9e:e0/05:00:be:00:00/40 tag 7 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata3.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata3.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata3.00: cmd 60/40:40:b8:a3:e0/05:00:be:00:00/40 tag 8 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata3.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata3.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata3.00: cmd 60/40:48:f8:a8:e0/05:00:be:00:00/40 tag 9 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata3.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata3.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata3.00: cmd 60/78:50:38:ae:e0/01:00:be:00:00/40 tag 10 ncq dma 192512 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata3.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata3: hard resetting link Jan 6 23:33:48 Dragon kernel: ata4.00: exception Emask 0x0 SAct 0x3c003000 SErr 0x0 action 0x6 frozen Jan 6 23:33:48 Dragon kernel: ata4.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata4.00: cmd 60/40:60:d0:82:e0/05:00:be:00:00/40 tag 12 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata4.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata4.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata4.00: cmd 60/40:68:10:88:e0/05:00:be:00:00/40 tag 13 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata4.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata4.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata4.00: cmd 60/40:d0:78:9e:e0/05:00:be:00:00/40 tag 26 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata4.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata4.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata4.00: cmd 60/40:d8:b8:a3:e0/05:00:be:00:00/40 tag 27 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata4.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata4.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata4.00: cmd 60/40:e0:f8:a8:e0/05:00:be:00:00/40 tag 28 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata4.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata4.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata4.00: cmd 60/78:e8:38:ae:e0/01:00:be:00:00/40 tag 29 ncq dma 192512 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:82:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata4.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata4: hard resetting link Jan 6 23:33:48 Dragon kernel: ata8.00: exception Emask 0x0 SAct 0x3c00 SErr 0x0 action 0x6 frozen Jan 6 23:33:48 Dragon kernel: ata8.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata8.00: cmd 60/40:50:78:9e:e0/05:00:be:00:00/40 tag 10 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata8.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata8.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata8.00: cmd 60/40:58:b8:a3:e0/05:00:be:00:00/40 tag 11 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata8.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata8.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata8.00: cmd 60/40:60:f8:a8:e0/05:00:be:00:00/40 tag 12 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata8.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata8.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata8.00: cmd 60/78:68:38:ae:e0/01:00:be:00:00/40 tag 13 ncq dma 192512 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata8.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata8: hard resetting link Jan 6 23:33:48 Dragon kernel: ata2.00: exception Emask 0x0 SAct 0x1e080 SErr 0x0 action 0x6 frozen Jan 6 23:33:48 Dragon kernel: ata2.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata2.00: cmd 60/40:38:10:88:e0/05:00:be:00:00/40 tag 7 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata2.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata2.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata2.00: cmd 60/40:68:78:9e:e0/05:00:be:00:00/40 tag 13 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata2.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata2.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata2.00: cmd 60/40:70:b8:a3:e0/05:00:be:00:00/40 tag 14 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata2.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata2.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata2.00: cmd 60/40:78:f8:a8:e0/05:00:be:00:00/40 tag 15 ncq dma 688128 in Jan 6 23:33:48 Dragon kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata2.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata2.00: failed command: READ FPDMA QUEUED Jan 6 23:33:48 Dragon kernel: ata2.00: cmd 60/78:80:38:ae:e0/01:00:be:00:00/40 tag 16 ncq dma 192512 in Jan 6 23:33:48 Dragon kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 6 23:33:48 Dragon kernel: ata2.00: status: { DRDY } Jan 6 23:33:48 Dragon kernel: ata2: hard resetting link Would this indicate there is an issue with disk6? I was copying the files from disk5 if that is relevant information. I've attached a new diagnostics taken after rebooting the server. It looks like there was an error on every drive in the array. Did unbalance cause this or do I have bad cables or is there another issue causing it? For reference, these are the pci devices for the drives. /sys/bus/pci/devices/0000:01:00.1/ata1/host1/target1:0:0/1:0:0:0/block/sdb /sys/bus/pci/devices/0000:01:00.1/ata2/host2/target2:0:0/2:0:0:0/block/sdc /sys/bus/pci/devices/0000:01:00.1/ata3/host3/target3:0:0/3:0:0:0/block/sdd /sys/bus/pci/devices/0000:01:00.1/ata4/host4/target4:0:0/4:0:0:0/block/sde /sys/bus/pci/devices/0000:01:00.1/ata7/host7/target7:0:0/7:0:0:0/block/sdf /sys/bus/pci/devices/0000:01:00.1/ata8/host8/target8:0:0/8:0:0:0/block/sdg /sys/bus/pci/devices/0000:09:00.0/ata12/host12/target12:0:0/12:0:0:0/block/sdh dragon-diagnostics-20200106-2358.zip Edited January 7, 2020 by mrvnsk9 Quote Link to comment
JorgeB Posted January 7, 2020 Share Posted January 7, 2020 1 hour ago, mrvnsk9 said: Would this indicate there is an issue with disk6? To me it indicates a problem with the disk controller, since there are errors on almost all disks, could also be a power related problem. Quote Link to comment
mrvnsk9 Posted January 7, 2020 Author Share Posted January 7, 2020 8 hours ago, johnnie.black said: To me it indicates a problem with the disk controller, since there are errors on almost all disks, could also be a power related problem. I have a StarTech controller with a Marvell 88SE9230 chipset, which i have to disable IMMOU or it drops drives, in the system. I'm going to remove that controller from the array and see if that improves things (I only have one drive attached to it anyway). I should probably replace it with a LSI 9300-8i or something similar. Quote Link to comment
mrvnsk9 Posted January 8, 2020 Author Share Posted January 8, 2020 @johnnie.black Changing the bios to the correct setting seems to have done the trick. I'm also going to swap out the controller card for one that's actually supported by unraid. I'll consider this solved. Thanks for your help! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.