DieFalse Posted February 20, 2018 Posted February 20, 2018 Ever since Unraid 6.4.1 I have expierenced lockup/hard crash issues after about 10-11 hours runtime. I have: ran Memtest 86+ for 72 hours with no issues. (did not lockup) booted into safemode with no plugins, locked up after same timeframe checked for stable vs next release and only have one option 6.4.1 reset bios to defaults - no change When it freezes, I can not reach from local console or remote. When it crashes, mothboard shows double 0's Attached is diagnostics and here is tail /var/log/syslog -f right after reboot from local console. Feb 20 17:12:24 NAS dnsmasq[15729]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify Feb 20 17:12:24 NAS dnsmasq-dhcp[15729]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h Feb 20 17:12:24 NAS dnsmasq-dhcp[15729]: DHCP, sockets bound exclusively to interface virbr0 Feb 20 17:12:24 NAS dnsmasq[15729]: reading /etc/resolv.conf Feb 20 17:12:24 NAS dnsmasq[15729]: using nameserver 192.168.1.1#53 Feb 20 17:12:24 NAS dnsmasq[15729]: read /etc/hosts - 2 addresses Feb 20 17:12:24 NAS dnsmasq[15729]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses Feb 20 17:12:24 NAS dnsmasq-dhcp[15729]: read /var/lib/libvirt/dnsmasq/default.hostsfile Feb 20 17:12:24 NAS kernel: virbr0: port 1(virbr0-nic) entered disabled state Feb 20 17:16:33 NAS ntpd[1918]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized nas-diagnostics-20180220-1727.zip
JorgeB Posted February 20, 2018 Posted February 20, 2018 Are you disabling C-States? Seen the notes about Ryzen CPUs here:
DieFalse Posted February 20, 2018 Author Posted February 20, 2018 Thank you, its been disabled in bios since I first setup, I haven't checked since restoring to bios defaults.
JorgeB Posted February 20, 2018 Posted February 20, 2018 You should also add the line to the go file, like the link mentions.
DieFalse Posted February 21, 2018 Author Posted February 21, 2018 13 minutes ago, johnnie.black said: You should also add the line to the go file, like the link mentions. Sorry, I should have been more specific. I just rebooted to confirm, its disabled in bios and the go file still has the correct line you referenced in it also.
DieFalse Posted February 21, 2018 Author Posted February 21, 2018 So, that time it lasted 30 mins before halt. Any ideas?
ljm42 Posted February 21, 2018 Posted February 21, 2018 1 hour ago, fmp4m said: I just rebooted to confirm, its disabled in bios and the go file still has the correct line you referenced in it also. Can you upload a new diagnostic file?
DieFalse Posted February 21, 2018 Author Posted February 21, 2018 3 hours ago, ljm42 said: Can you upload a new diagnostic file? See attachment nas-diagnostics-20180220-2254.zip
JorgeB Posted February 21, 2018 Posted February 21, 2018 7 hours ago, fmp4m said: I just rebooted to confirm, its disabled in bios and the go file still has the correct line you referenced in it also. The go file didn't have the change on your first diags, it does have on the second ones.
DieFalse Posted February 21, 2018 Author Posted February 21, 2018 6 hours ago, johnnie.black said: The go file didn't have the change on your first diags, it does have on the second ones. That is very strange, it was there when I opened the go file in vi. all I did was :q to exit. To be certain I am not losing my mind here is the only line non-default in my go file: /usr/local/sbin/zenstates --c6-disable I am wondering if my USB is going bad since you say there was a difference. Could that be?
JorgeB Posted February 21, 2018 Posted February 21, 2018 1 minute ago, fmp4m said: I am wondering if my USB is going bad since you say there was a difference. Could that be? It's possible, you can check the diags yourself, go file is in the config folder.
DieFalse Posted February 21, 2018 Author Posted February 21, 2018 2 minutes ago, johnnie.black said: It's possible, you can check the diags yourself, go file is in the config folder. That is odd! I have a disk :: 5 :: that is bad, warrantying it out today, I dont want it to crash with it out so I have shut the server down and will replace the USB when disk5 is replaced later today. In the meantime, I can't think of anything else causing the issue.
Warrentheo Posted February 21, 2018 Posted February 21, 2018 I have not had an AMD cpu for a while, but I hear news that they still have issues with the Real Time Clock drifting over time... Don't know if that has to do with the C-States or not, or exactly what to do to fix it... Just saw a computer news article about benchmarks of the new AMD APU's being screwed up recently because the benchmark would take 1 minute and 7 seconds to complete the benchmark, but only report to Windows that it took 1 minute even... The issue seemed to be related to how the clock behaved after it woke from sleep, and that they were in the middle of fixing the patch... I have been thinking about switching back over to AMD recently, however all the way back to the AMD Athlon I used to have, they have always had major issues with the clock drifting all over the place for some reason... That and single threaded performance is what keeps me buying Intel currently even though AMD is cheaper for an overall faster chip... Maybe clock drift is why your NTP Client is having issues?
pwm Posted February 21, 2018 Posted February 21, 2018 6 hours ago, Warrentheo said: I have not had an AMD cpu for a while, but I hear news that they still have issues with the Real Time Clock drifting over time... Don't know if that has to do with the C-States or not, or exactly what to do to fix it... I'm pretty sure I saw a Linux patch that would help with the drifting - the clock drift is caused by the jittering of the CPU clock to reduce the EMI noise emitted by the processor and all data signals to RAM, chipset etc. It's just that the jitter is very slightly changing the average clock speed which also affects the timing.
DieFalse Posted February 21, 2018 Author Posted February 21, 2018 Even with the NTP error, I don't think that is causing the hard lockups/crashes. I have added two other NTP sites since seeing that line and have yet to reproduce the NTP error. I am replacing a bad drive (I wonder if the stress from parity checks after all these lockups helped it die lol) before I continue troubleshooting.
DieFalse Posted March 21, 2018 Author Posted March 21, 2018 Ok. I have replaced the bad HDD, replaced the USB boot drive. I have also upgraded to 6.5.0 with same issue now 6.5.1-rc1 same issue.
gelmi Posted March 27, 2018 Posted March 27, 2018 Hi, I observe the same symptoms - no web interface, no ping - after some hours. I have updated BIOS with C-states disabled some time ago and --c6-disable is in the script. However, today I tried to unplug and plug back in network cable when I could not access web interface and after couple of seconds, everything went back to normal. I have a tower connected to a WiFi bridge. Diagnostic is attached. I do not know if this is related to Unraid (upgraded to 6.5 couple of days ago and this problem started to occur) or to WiFi bridge. darktower-diagnostics-20180327-0826.zip
Recommended Posts
Archived
This topic is now archived and is closed to further replies.