Jump to content

Hard Freeze / Hard Crash - After 10-11 hours runtime


DieFalse

Recommended Posts

Posted

Ever since Unraid 6.4.1 I have expierenced lockup/hard crash issues after about 10-11 hours runtime.

 

I have:

ran Memtest 86+ for 72 hours with no issues. (did not lockup)

booted into safemode with no plugins,  locked up after same timeframe

checked for stable vs next release and only have one option 6.4.1

reset bios to defaults - no change

 

When it freezes,  I can not reach from local console or remote.

When it crashes,  mothboard shows double 0's

 

Attached is diagnostics and here is

tail /var/log/syslog -f

right after reboot from local console.

 

Feb 20 17:12:24 NAS dnsmasq[15729]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
Feb 20 17:12:24 NAS dnsmasq-dhcp[15729]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Feb 20 17:12:24 NAS dnsmasq-dhcp[15729]: DHCP, sockets bound exclusively to interface virbr0
Feb 20 17:12:24 NAS dnsmasq[15729]: reading /etc/resolv.conf
Feb 20 17:12:24 NAS dnsmasq[15729]: using nameserver 192.168.1.1#53
Feb 20 17:12:24 NAS dnsmasq[15729]: read /etc/hosts - 2 addresses
Feb 20 17:12:24 NAS dnsmasq[15729]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Feb 20 17:12:24 NAS dnsmasq-dhcp[15729]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Feb 20 17:12:24 NAS kernel: virbr0: port 1(virbr0-nic) entered disabled state
Feb 20 17:16:33 NAS ntpd[1918]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

 

 

nas-diagnostics-20180220-1727.zip

Posted
13 minutes ago, johnnie.black said:

You should also add the line to the go file, like the link mentions.

 

 

 

Sorry, I should have been more specific.   

 

I just rebooted to confirm, its disabled in bios and the go file still has the correct line you referenced in it also.

 

Posted
1 hour ago, fmp4m said:

I just rebooted to confirm, its disabled in bios and the go file still has the correct line you referenced in it also.

 

Can you upload a new diagnostic file?

Posted
7 hours ago, fmp4m said:

I just rebooted to confirm, its disabled in bios and the go file still has the correct line you referenced in it also.

The go file didn't have the change on your first diags, it does have on the second ones.

Posted
6 hours ago, johnnie.black said:

The go file didn't have the change on your first diags, it does have on the second ones.

 

That is very strange,    it was there when I opened the go file in vi.   all I did was :q to exit.

To be certain I am not losing my mind here is the only line non-default in my go file:

/usr/local/sbin/zenstates --c6-disable

I am wondering if my USB is going bad since you say there was a difference.   Could that be?

Posted
1 minute ago, fmp4m said:

I am wondering if my USB is going bad since you say there was a difference.   Could that be?

It's possible, you can check the diags yourself, go file is in the config folder.

Posted
2 minutes ago, johnnie.black said:

It's possible, you can check the diags yourself, go file is in the config folder.

 

That is odd!   I have a disk :: 5 :: that is bad,  warrantying it out today,  I dont want it to crash with it out so I have shut the server down and will replace the USB when disk5 is replaced later today.    In the meantime,  I can't think of anything else causing the issue.

Posted

I have not had an AMD cpu for a while, but I hear news that they still have issues with the Real Time Clock drifting over time... Don't know if that has to do with the C-States or not, or exactly what to do to fix it...  Just saw a computer news article about benchmarks of the new AMD APU's being screwed up recently because the benchmark would take 1 minute and 7 seconds to complete the benchmark, but only report to Windows that it took 1 minute even...  The issue seemed to be related to how the clock behaved after it woke from sleep, and that they were in the middle of fixing the patch...

 

I have been thinking about switching back over to AMD recently, however all the way back to the AMD Athlon I used to have, they have always had major issues with the clock drifting all over the place for some reason...  That and single threaded performance is what keeps me buying Intel currently even though AMD is cheaper for an overall faster chip...

 

Maybe clock drift is why your NTP Client is having issues?

Posted
6 hours ago, Warrentheo said:

I have not had an AMD cpu for a while, but I hear news that they still have issues with the Real Time Clock drifting over time... Don't know if that has to do with the C-States or not, or exactly what to do to fix it...

I'm pretty sure I saw a Linux patch that would help with the drifting - the clock drift is caused by the jittering of the CPU clock to reduce the EMI noise emitted by the processor and all data signals to RAM, chipset etc. It's just that the jitter is very slightly changing the average clock speed which also affects the timing.

Posted

Even with the NTP error,   I don't think that is causing the hard lockups/crashes.    I have added two other NTP sites since seeing that line and have yet to reproduce the NTP error.    I am replacing a bad drive (I wonder if the stress from parity checks after all these lockups helped it die lol) before I continue troubleshooting.

  • 1 month later...
Posted

Ok.   I have replaced the bad HDD,   replaced the USB boot drive.  

I have also upgraded to 6.5.0 with same issue now 6.5.1-rc1 same issue.

 

Posted

Hi,

I observe the same symptoms - no web interface, no ping - after some hours. I have updated BIOS with C-states disabled some time ago and --c6-disable is in the script. However, today I tried to unplug and plug back in network cable when I could not access web interface and after couple of seconds, everything went back to normal. I have a tower connected to a WiFi bridge. Diagnostic is attached.

I do not know if this is related to Unraid (upgraded to 6.5 couple of days ago and this problem started to occur) or to WiFi bridge.

 

darktower-diagnostics-20180327-0826.zip

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...