Need Assistance with Random Hard lock and CPU stall


Recommended Posts

Over the past few weeks I've been experiencing random hard freeze almost every couple of days. It doesn't seem to matter whether there's a high load/activity or if the system is idle (in terms of me accessing anything).  Unfortunately it has gotten worse to a point that it happens practically everyday. I primarily use the server as a NAS and I only have a handful of dockers and VMs (which I haven't used for quite some time). Also, I only run dockers as needed and I don't leave them running constantly.

My initial reaction was the docker ipvlan setting which I found on other threads, but changing it to macvlan didn't clear the issue. Next stop was the cache drive. I found some data corruption on my nvme cache drive which was set to btrfs so today I decided to rebuild cache drive and try XFS. I disabled docker, cleared/deleted partition/formatted to XFS which all went smoothly but before I can even rebuild any of my dockers, I ran into a couple more hard freeze. When this happens, the GUI becomes unresponsive and I have to manually shutdown. Ignoring the data corruption lines in the syslog, I noticed quite a few entries of rcu_sched self-detected stall on CPU which I'm not familiar with. Can someone pls review the attached logs, dumb it down for me pls and provide feedback? I only exported the sections before I had to shutdown but I can upload the entire syslog if needed. 

Thanks in advance and always very appreciative of the community. 

 

Edit: forgot extra details. XMP is disabled and also disabled C-states for kicks.

 

CPU: Threadripper 2990WX

Mobo: MSI MEG X399 Creation
RAM: HyperX Predator 64GB (4x16) 3000MHz CL15
UnRaid v: 6.10.3

 

 

 

random1_06.22.2022.txt random2_06.24.2022.txt

Edited by Rock G
added details
Link to comment
9 hours ago, Rock G said:

My initial reaction was the docker ipvlan setting which I found on other threads, but changing it to macvlan didn't clear the issue.

That would be the other way around, macvlan is usually the problem, but looks unrelated to your current issues.

 

9 hours ago, Rock G said:

XMP is disabled

 

9 hours ago, Rock G said:

RAM: HyperX Predator 64GB (4x16) 3000MHz CL15

 

Do mean it's not running at 3000MT/s? Diags would show.

 

 

In any case and because of btrfs detecting data corruption I would start by running memtest.

 

 

Link to comment

Thank you @JorgeB and yes my mistake, I meant I switched the docker from macvlan to ipvlan mainly because I initially noticed the hard locks whenever Plex is in use (which happens to be the docker always in use). 

 

Quote

Do mean it's not running at 3000MT/s?

 

I had the XMP profile enabled and set to 3200MT/s but I decided to disable it recently while trying to pinpoint the root of this issue. The stock speed is 3000MT/s. I'll monitor over the weekend to see how things go and then run memtest later if needed. Appreciate the suggestion. 

 

 

 

Edited by Rock G
Link to comment

I understand and you are absolutely correct, always check compatibility between the 3 parts. The mobo and chipset supports the stock speed of my RAM however it's a bit outside of the official support of the CPU which I also noticed on the faq below.

 

However, I've had this cpu+ram+mobo combo for roughly 3yrs+ running unraid with no major headaches or annoyances and I imagined if it was not compatible, it would've exhibited the same issues shortly after and not just now. Nevertheless, I will make note of this and if I'm unable to pinpoint the cause and it becomes unbearable, I will swap them out. Appreciate your feedback @JonathanM.

 

Edited by Rock G
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.