[SOLVED] Daily Hard Lock


Recommended Posts

Version: 6.6.7 UNRAID Server Plus

 

MB: ASRock X370 Taichi

CPU: Ryzen (1st gen) 1700x BIOS: v4.8

Mem: 16GB G. Skill TridentZ DDR4 3200

Cache: 250GB Samsung 970 EVO NVMe SSD

Array: Assorted WD & Seagate

Unassigned Device: 250GB Samsung 840 EVO SSD

 

Hello all, I am at my wit's end with this issue. I have been running my current setup for nearly 3 years without a hiccup but starting in January I started experiencing odd behavior. The first sign on something amiss was my server froze, upon restart all of my dockers and VMs were gone. It ran fine for a few weeks until seemingly out of the blue it began freezing every 12 hours or so. I have a monitor and keyboard attached but the terminal was completely unresponsive. I restart and enter GUI mode to see if I can gather more info or if I'm given a warning before freezing, but get nothing. The GUI freezes completely. No keyboard or mouse input accepted. Fix Common Problems reports nothing.

 

Since then, the following actions have been taken with no improvement in outcome:

 

1. Uninstalled all VMs, docker containers, and plugins -- system still froze

2. Reinstalled OS -- system still froze

3. Reinstalled all SATA devices -- system still froze

 

Attached to this post is a .zip containing diagnostics and an image of the syslog when the system was frozen. Please feel free to let me know if additional information is required.

 

AND THANK YOU FOR ANY HELP. I'm at the end of my rope with this.

tower-diagnostics-20190305-1304.zip

Edited by ImBadAtThis
Link to comment
2 hours ago, ImBadAtThis said:

Version: 6.6.7 UNRAID Server Plus

 

MB: ASRock X370 Taichi

CPU: Ryzen (1st gen) 1700x BIOS: v4.8

Mem: 16GB G. Skill TridentZ DDR4 3200

Cache: 250GB Samsung 970 EVO NVMe SSD

Array: Assorted WD & Seagate

Unassigned Device: 250GB Samsung 840 EVO SSD

 

Hello all, I am at my wit's end with this issue. I have been running my current setup for nearly 3 years without a hiccup but starting in January I started experiencing odd behavior. The first sign on something amiss was my server froze, upon restart all of my dockers and VMs were gone. It ran fine for a few weeks until seemingly out of the blue it began freezing every 12 hours or so. I have a monitor and keyboard attached but the terminal was completely unresponsive. I restart and enter GUI mode to see if I can gather more info or if I'm given a warning before freezing, but get nothing. The GUI freezes completely. No keyboard or mouse input accepted. Fix Common Problems reports nothing.

 

Since then, the following actions have been taken with no improvement in outcome:

 

1. Uninstalled all VMs, docker containers, and plugins -- system still froze

2. Reinstalled OS -- system still froze

3. Reinstalled all SATA devices -- system still froze

 

Attached to this post is a .zip containing diagnostics and an image of the syslog when the system was frozen. Please feel free to let me know if additional information is required.

 

AND THANK YOU FOR ANY HELP. I'm at the end of my rope with this.

tower-diagnostics-20190305-1304.zip

I would start by removing the NerdPack, preclear, and S3 Sleep plugins.  See if removing any of these plugins helps.

 

If removing these doesn't solve the problem, reboot in safe mode.

 

If that doesn't help, you should test your memory.

Link to comment

Thanks for the response. Ive actually run a memory test and everything was fine. I will try uninstalling those plugins again...and then Safe Mode.

 

If I dont have any issues in Safe Mode, what is that telling me? How should I proceed from there?

 

Additionally, Ive attached an image of the most recent lockup from just a few minutes ago.

 

 

IMG_0006.jpeg

Link to comment

Nearly three years? AMD only launched Ryzen 7 on 2 March 2017. Either way, that's a very early Ryzen 1700X you're using, so how are you mitigating its tendency to lock up when idling? I don't see either the boot option tweak or the go file tweak in your diagnostics so maybe you've disable C6 state in your BIOS or you've configured normal power idle instead of the default low power idle? Perhaps after a recent BIOS update you forgot to re-select this particular customisation.

Link to comment

You have virtualisation support disabled in your BIOS, which is why kvm_amd can't load and your VMs won't work. That is the default in the ASRock BIOS, which suggests that the defaults have been restored or the CMOS has been corrupted. Look for SVM (secure virtual machine, a.k.a. AMD-V) and enable it.

Mar  5 12:47:23 Tower kernel: kvm: disabled by bios
Mar  5 12:47:23 Tower root: modprobe: ERROR: could not insert 'kvm_amd': Operation not supported

 

Edited by John_M
Added error messages from syslog
Link to comment
14 minutes ago, John_M said:

You have virtualisation support disabled in your BIOS, which is why kvm_amd can't load and your VMs won't work. That is the default in the ASRock BIOS, which suggests that the defaults have been restored or the CMOS has been corrupted. Look for SVM (secure virtual machine, a.k.a. AMD-V) and enable it.


Mar  5 12:47:23 Tower kernel: kvm: disabled by bios
Mar  5 12:47:23 Tower root: modprobe: ERROR: could not insert 'kvm_amd': Operation not supported

 

To your first post, 2 years then. I couldnt remember when it was released, but it was soon after initial release that I purchased the CPU. Also, Ive never experienced instability issues with this system and thus never disabled C6 or effected the boot options tweak or the go file tweak.

 

Yes Ive simply reset CMOS and havent yet re-enabled virtualization. Ive been focused on just getting the system stable as-is.

Link to comment
26 minutes ago, ImBadAtThis said:

Yes Ive simply reset CMOS and havent yet re-enabled virtualization. Ive been focused on just getting the system stable as-is.

You didn't mention clearing the CMOS in the list of things you'd done to try to fix your problem. In that case, disable low power idle in the BIOS and enable SVM and see how it goes. You'll want to enable XMP so that you can set the RAM speed to DDR4-2666 (assuming you have two 8 GB modules), too.

Link to comment

Mar  5 12:47:01 Tower kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
### [PREVIOUS LINE REPEATED 15 TIMES] ###

 

Suggest check the BIOS setting ( try enable/disable/auto ), this about Ryzen C6 issue. Besides, I never try S3 sleep plugin on my Ryzen.

From my point 0f view, your problem relate sleep dead.

Edited by Benson
Link to comment
On 3/5/2019 at 3:49 PM, John_M said:

Nearly three years? AMD only launched Ryzen 7 on 2 March 2017. Either way, that's a very early Ryzen 1700X you're using, so how are you mitigating its tendency to lock up when idling? I don't see either the boot option tweak or the go file tweak in your diagnostics so maybe you've disable C6 state in your BIOS or you've configured normal power idle instead of the default low power idle? Perhaps after a recent BIOS update you forgot to re-select this particular customisation.

Thanks for the feed back all! John_M, can you point me to the text for that go file tweak? I cant seem to find it. Not 6 months ago I couldnt avoid it, but now that I need it... As an update, Ive modified the Power Supply Idle Control, as this was apparently introduced in a recent BIOS as a way to avoid disabling C6 altogether. If this doesnt work, Im going to disable C6 in the bios. If that doesnt work, Im going to use the go file tweak. If that doesnt work I will investigate disabling rcu callbacks per spaceinvaderone's video.

 

If any of this sounds like a garbage plan, let me know.

 

Thanks

Link to comment

I'd try the Power Supply Idle Control first. If that doesn't work then try this zenstates go file tweak before you resort to disabling C-states in the BIOS. These are the instructions:

Quote

Edit your \\tower\flash\config\go script (using a good editor like Notepad++ (not Notepad)) and add the "zenstates" command right before "emhttp", like this:


/usr/local/sbin/zenstates --c6-disable
/usr/local/sbin/emhttp &

 

 

This is the post (it's the first message - scroll down a bit to find it): 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.