System Randomly Locking Up - v6.11.5 - General Support

February 19, 20233 yr

Hi all,

For the past couple months I've been trying to determine the cause of my Unraid system randomly becoming unreachable. Sometimes it'll run with no apparent issues for weeks, and other times it'll last a day or two before freezing again. This system ran flawlessly for years until I made two major changes:

Change 1: Upgraded the hardware. Basically a full transplant of the drives to a new system with a new CPU, new motherboard, new RAM, moved the drives from SATA splitters to SAS HBA's, and added an Intel A770 GPU for future video encoding endeavors.

Change 2: After determining there was no issues with the hardware upgrade, I then upgraded to the version V6.11.5 from V6.9.2 in hopes that it might play better with the A770, which is technically unsupported on Linux 5 (I've since been able to get the GPU to passthrough to a VM with no issues).

I suspect the issues I'm having are related to change #1, but I haven't been able to determine what specifically is the cause. The errors I'm encountering in the logs are beyond my knowledge to troubleshoot and Google has not been helpful. I've run multiple memtests and reseated everything. Another, possibly unrelated, symptom I've encountered is that the Win10 VM I'm running for the A770 only runs for about two hours before locking up and pinning the CPU to 100% until I tell the VM to force shut down. I did not run a VM on the system prior to Change 1 and 2.

I've attached the syslog from today where it most recently froze and the system diagnostics. Thanks for any help you can provide!

syslog.txt gemininas-diagnostics-20230219-1305.zip

Edited February 19, 20233 yr by Isorikk

Quote

February 20, 20233 yr

Community Expert

There have been other users with issues with Ryzen 7xxx, try disabling C-states, also XMP on the RAM.

Quote

February 20, 20233 yr

Author

7 hours ago, JorgeB said:

There have been other users with issues with Ryzen 7xxx, try disabling C-states, also XMP on the RAM.

I will try disabling the C-states, however for the XMP on RAM, I originally had it disabled (saw no useful purpose for overclocking RAM on a storage server), but I was getting a different kernel panic with data corruption. It appeared that the bits were sometimes flipping in memory, and enabling XMP seems to have resolved it.

I will follow-up in a few days with the results of disabling C-states.

Quote

February 23, 20233 yr

Author

An update to my previous post:

After disabling Global C-State Control in the BIOS, I went to restart the system a little bit later and for some reason it wouldn't POST anymore. It booted exactly one time with C-States disabled and then never again. To resolve the issue I either had to unplug the drives from the SAS HBA's to get it to POST or, what I ended up doing, was flashing the BIOS with a newer version which appears to have reset the config.

I'm going to let it run for a few days with C-States re-enabled after the update, but I have a feeling that the issue will persist...

Edited February 23, 20233 yr by Isorikk

Quote

February 23, 20233 yr

Community Expert

System should still always boot with C-States disabled, sounds like a buggy BIOS.

Quote

February 23, 20233 yr

Author

20 minutes ago, JorgeB said:

System should still always boot with C-States disabled, sounds like a buggy BIOS.

Tell me about it... 🙃

Quote

February 26, 20233 yr

Author

System froze again with Global C-States disabled and XMP profile turned off. Log doesn't really have anything, this is all it has at the time before it went unresponsive:

Quote

Feb 25 16:32:39 GeminiNAS kernel: vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Feb 25 16:32:39 GeminiNAS kernel: br0: port 2(vnet2) entered blocking state
Feb 25 16:32:39 GeminiNAS kernel: br0: port 2(vnet2) entered disabled state
Feb 25 16:32:39 GeminiNAS kernel: device vnet2 entered promiscuous mode
Feb 25 16:32:39 GeminiNAS kernel: br0: port 2(vnet2) entered blocking state
Feb 25 16:32:39 GeminiNAS kernel: br0: port 2(vnet2) entered forwarding state
Feb 25 16:32:41 GeminiNAS avahi-daemon[7613]: Joining mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fed4:45f8.
Feb 25 16:32:41 GeminiNAS avahi-daemon[7613]: New relevant interface vnet2.IPv6 for mDNS.
Feb 25 16:32:41 GeminiNAS avahi-daemon[7613]: Registering new address record for fe80::fc54:ff:fed4:45f8 on vnet2.*.
Feb 25 16:32:42 GeminiNAS acpid: input device has been disconnected, fd 6
Feb 25 16:32:42 GeminiNAS acpid: input device has been disconnected, fd 7
Feb 25 16:32:42 GeminiNAS acpid: input device has been disconnected, fd 8
Feb 25 16:32:53 GeminiNAS kernel: usb 1-9: reset full-speed USB device number 7 using xhci_hcd

I've attached the full syslog for today, with it going unresponsive at approximately 4:55pm.

syslog.txt

Quote

February 26, 20233 yr

Community Expert

Try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right))

Quote

February 26, 20233 yr

Author

8 hours ago, JorgeB said:

Try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right))

I've updated the configuration to ipvlan, not sure how it got set to macvlan in the first place, but good catch! So far I've encountered no issues. I will let it run for a few days again to see if the issue persists.

Quote

1

February 27, 20233 yr

Author

A new development:

The system has not frozen or gone unresponsive, but this morning I discovered that all of the docker and plugin services were unable to reach out to check for updates. I suspect this is directly related to changing from macvlan to ipvlan. I did some further investigation and determined that all of the services were able to reach my local network, including the gateway, but could not reach out to the internet. The Unraid system itself could reach just fine, only the plugin/docker services seemed to be blocked.

A restart resolved the issue. I'm wondering if the issue may be caused by the network card...

I've attached more logs from this morning. It looks like a lot of weird stuff is happening with br0 but I'm not certain if this was just a one-time bug or a symptom of a larger issue.

syslog.txt

Quote

1

March 3, 20233 yr

Author
Solution

I believe my problems were caused my trying to run the VM on top of whatever else is going on. Disabling VM's has eliminated all weird bugs. It's possible the culprit was the unsupported video card. I have since tried an unofficial kernel that adds drivers for the video card and it has been playing well with Docker containers thus far. I'm going to go ahead and mark this thread as resolved, with the solution being don't use VM's with new hardware!

Quote

1

System Randomly Locking Up - v6.11.5

Featured Replies

Solved by Isorikk

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)