Cliff Posted December 15, 2021 Share Posted December 15, 2021 (edited) I have an unraid server which has been running great for over a year. I am also running a windows 11 VM but after a while I stopped getting updates as I needed to pass throug a TPM module. I then upgraded to the next-branch of unraid and everything seamed to work well. But after a while I started getting random reboots where the whole server crashed that happened more and more frequently. My first suspicion was that the next-branch caused some problem with the gpu my VM vas using (GTX 1060 vith dumped bios). as I sometimes noticed that the log was at 100% before crashing and was filling up with: 2021-12-11T22:59:01.085965Z qemu-system-x86_64: vfio_region_write(0000:0a:00.0:region1+0x16ff78, 0x0,1) failed: Device or resource busy And if I look in my device info it looks like my 1060 gpu has the address 0000:0a:00.0. Most of the times the server starts normally after a crash/reboot and the Windows 11 VM starts up fine with video-output from the gpu. But after ~8hours - 5min the server craches and seams that it is getting worse as latly the server mostly crashes after a few minutes. So I firstly tried disabeling the VM´s but the server did still crash. I downgraded to unraid stable, updated bios and removed my 1060 gpu from the server but still is crashes after a few minutes. I am currently running memtest to exclude any memory-problems. But what else can I do if the memory is fine ? Can there be some problem with the Unraid USB-stick or something else that causes this problems ? My server specs: Asus TUF Gaming X570-Plus 64GB RAM AMD Ryzen 9 3900X Nvidia 1060 6Gb Nvidia GT 710 250Gb SSD cache 2x12TB HDD +12TB parity 512GB M2 for VM The only strange thing that I have noticed even when the server was running fine is that I always have to unplug all cables from the 1060 gpu when rebooting the server otherwise I can't use it in any VM's. I am also using a nvidia GT 710 with an HDMI dummy plug as unraid needs a gpu as I understand. unraid-diagnostics-20211212-1451.zip Edited December 15, 2021 by Cliff Quote Link to comment
Cliff Posted December 15, 2021 Author Share Posted December 15, 2021 (edited) Memtest has now completed 2 passes and no errors, does anyone have any tips on what I should look at next ? Edit: Also removed UPS, and server still crashed after 5 minutes Edited December 15, 2021 by Cliff Quote Link to comment
Squid Posted December 15, 2021 Share Posted December 15, 2021 Have you looked at this yet? https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-819173 Quote Link to comment
Cliff Posted December 15, 2021 Author Share Posted December 15, 2021 I disabled C-States but server still reboots Quote Link to comment
Squid Posted December 15, 2021 Share Posted December 15, 2021 Not necessarily your problem, but something of interest You bought Corsair Dominator (64Gig Kit 4x16) memory. Corsair when listing compatibility on it only shows Intel Chipsets. (And Asus doesn't list that memory within their QVL either) While I will usually only buy from the Motherboards QVL (as I don't trust memory manufacturer's compatibility lists), I find it curious that neither of them list it as being compatible. Quote Link to comment
Cliff Posted December 15, 2021 Author Share Posted December 15, 2021 ok, but I have been using the same ram since I built my unraid server and it just started crashing a couple of days ago. I also tried removing the GT 710 GPU but nothing changes. Quote Link to comment
Cliff Posted December 15, 2021 Author Share Posted December 15, 2021 (edited) I ordered new usb-drives to check if that solves anything. Edit: I just remembered that I am passing through the m2 as a raw disk to the windows 11 VM. So I tried booting from it directly without unraid, and right now it looks like there are no problems. I did some testing fo about an hour where I ran cpu/gpu stress tests and there was no reboots. I will do some more testing when I get home from work. But if it works without problems without unraid what could have caused all this instabillity ? Will all be fixed if I migrate to a new usb-key or could there still be something else that is corrupted in some way? And can I still transfer all settings docker/vms/etc. to the new USB or do I risk corrupting something again if I try to reuse all my settings ? Edited December 16, 2021 by Cliff Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.