DCG Posted May 19, 2021 Share Posted May 19, 2021 I've been running into this issue a couple of times now and I can't wrap my head around what could be causing it... When I first used Unraid I ran it on a Asrock x570 Pro4, paired with a R5 3600 and 2x 16 GB ECC ram. It worked fine, so I added a LSI SAS 9211-8i to be able to connect more HDD's in the future. The next upgrade was an Intel X550-T2 and a X570 Phantom Gaming X, to increase the speed to my desktop (only seem to get half the speed, but that's another thing I need to figure out). After swapping the motherboard I would experience seemingly random system hangs, in which the data on the unraid unit would be inaccessible, sometimes I could log into the unraid unit, othertimes I couldn't, I could only "fix" it by shutting it down via the power button... If I could get into the webgui, I couldn't shut it down via the shutdown or reboot command. This happened about 3 months, about every 9 or 14 days. Since I had upgraded to 6.9.0 in the mean time and started using the X550-T2 at the same I swapped the motherboard, I wanted to make sure it wasn't one of the others causing the issue. I replaced the older x570 Pro4 and had it running for a month without any issues, so I swapped back the Phantom X again, this time with a change to power supply idle control as per: It had been running fine for about 14 days (daily checks to see if anything weird was up) and this evening I noticed 2 cores at 100%. Thinking this was weird I tried to run a diagnostic, but it wouldn't do anything for about an hour... I tried to see if I could access the data on the HDD's and I could, both trough samba and directly trough the webgui. Now the webgui didn't show the drives spinning up? The command line didn't respond either. One other thing I did notice was that the unassigned drive was unmounted, whilst I'm quite certain that it was mounted (used for the VM, which is off by default) this morning. The attached diagnostics are with the unraid unit only being up for 4 hours, so I'm not to certain they are useful, but maybe there's something obvious I'm missing in there. I'm also running Fix common problems, but other than it reporting I'm not having auto updates for docker and plugins, it doesn't find anything nas2-diagnostics-20210519-2359.zip Quote Link to comment
JorgeB Posted May 20, 2021 Share Posted May 20, 2021 Enable this then post that log after a crash. Quote Link to comment
DCG Posted May 21, 2021 Author Share Posted May 21, 2021 I have enabled the syslog server and confirmed it is writing an file. When it crashes again, I'll post an update containing the file. Quote Link to comment
DCG Posted June 30, 2021 Author Share Posted June 30, 2021 (edited) It's been a while, with only a crash just now (ran for more than a month before I wanted to add in different GPU and reseat the NVME's) I was tinkering with a windows VM today (can't seem to get the GPU to consistently identify itself...), when the entire system crashed. I bound the GPU (12:00.0) to VFIO, but windows kept throwing an error 31, which I managed to "fix" to a 43 by installing an hidden qemu device in device manager. GPU-Z wouldn't show manufacturer or chip data, other than the BIOS and chip family. I had dumped my own Vbios through a bare metal install, but I was thinking something might have been wrong with it When I tried to dump it through the windows VM, I lost the RDP connection (Which I thought might have been just because of the dump), but when I refreshed the VM window of Unraid, I got kicked to the login screen of Unraid.... It had also almost finished a Parity check (which has automatically been restarted). Fix common problems reported an device error, and prompted me to install "mcelog". I've attached both the longer running syslog and the diagnostics. One thing to not might be that I reset my router every morning at 3 AM, so that might show up in the logs as well. A couple of days ago I noticed the time on the server was incorrect again (got african times, whilst I'm in GMT +1 ) Now my best guess is that I messed something up with the GPU in the VM causing it to take (part) of the rest of the system down with it, but I'm not certain. Edit. Something I did manage to fix on my own was the NIC, flow control seems to have been the bane in that regard (even though I did test that previously...). syslog-192.168.1.170.log nas2-diagnostics-20210630-1711.zip Edited June 30, 2021 by DCG Quote Link to comment
JorgeB Posted June 30, 2021 Share Posted June 30, 2021 After a quick look at the syslog only thing I see out of the ordinary is an nf_conntrack call trace, see if this applies to you: https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/ Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.