Jump to content

Server shutting down randomly


Recommended Posts

Hi everyone, I am a new unraid user and I thought I had everything installed smoothly but the last two days now I have had unexpected crashes.  I get no warning before this happens so I am not sure how to recreate it.  I did grab a screenshot of the dash that was frozen while the server restarted. I also attached the diagnostic zip grabbed after the reboot.

 

The only things I had running at the time of both crashes was qbittorrent and my windows vm.

I am running Version: 6.12.6

 

the first time the crash happened my video card stopped being passed through to my vm on the reboot. i noticed IOMMU was changed to disabled. I went back to my bios and changed it to enabled, repassed it to my vm and everything ran nice again until just now. this second crash happened but the gpu is still passed through and IOMMU is still enabled so maybe that was just a random thing that happened during the first crash?

 

I am very new to all this so if there is anything else that would help you help me work things out I am happy to provide it.  Just let me know how to obtain the info from my system and I can share it with you.

 

System INFO:

Model:Custom

M/B:Micro-Star International Co., Ltd. Z590 PRO WIFI (MS-7D09) Version 1.0 s/n 07D0910_L31E243506

BIOS:American Megatrends International, LLC. Version 1.90 Dated 06/06/2023

CPU:11th Gen Intel® Core™ i9-11900KF @ 3.50GHz

HVM:Enabled

IOMMU:Enabled

Cache:L1 Cache: 384 KiB, L1 Cache: 256 KiB, L2 Cache: 4 MiB, L3 Cache: 16 MiB

Memory:32 GiB DDR4 (max. installable capacity 64 GiB)

Network:eth0: 1000 Mbps, full duplex, mtu 1500

Kernel:Linux 6.1.64-Unraid x86_64

OpenSSL:1.1.1v

 

 

 

925669194_ScreenShot2024-02-23at7_13_51PM.thumb.png.f9aaa4c7595a363d0e7f0f67d5aeaab7.png

pcserver-diagnostics-20240223-1919.zip

Link to comment
4 hours ago, delgadot2040 said:

i am currently running Memtest86 but it looks like it gonna be a while to complete.  i did recently add another kit of identical ram to my system so maybe thats the issue? hopefully the test shows more.

If memtest fails then that is definitive.   If it passes you can still have a RAM issue when the system is under load so in such a case the solution is often to run with less RAM sticks installed.

Link to comment
5 hours ago, itimpi said:

If memtest fails then that is definitive.   If it passes you can still have a RAM issue when the system is under load so in such a case the solution is often to run with less RAM sticks installed.

ok so i ran the test overnight and the memory passed with 0 errors.  i noticed the ram config were as follows and i was not using these settings because i had xmp enabled at the time of the crash.  (that was running at 2933 speed)

IMG_0850.HEIC

 

so perhaps the crash was caused by the ram being pushed harder than it could handle?

when the test finished i went into my bios and checked to see if i had the same basic ram config and NOT with the xmp settings.

IMG_0852.HEIC

i also noticed that IOMMU had been disabled again so i enabled it.

 

with the ram settings that passed the memtest i started unraid again and tried to run my windows vm again but the hdmi connected to my gpu did not pass the signal even though i reenabled IOMMU.  my screen changes from the initial unraid linux load scroll text to a black screen as if the vm was going to come thru but then it just says no signal.  i tried to stop the vm but it would not and i needed to use the force stop option to close it.

 

so i am able to run unraid but not my vm. what should i do?

 

EDIT/

i was able to get back in my vm by unbinding and rebinding my video card to the vfio.  the strange thing is now the offical unraid logo does not appear before loading the vm.. it just goes to black screen, no signal and then im in windows.

 

if none of this seems like a major issue i am fine with testing this new ram setting on the system until a new crash happens.  once that happens i have the syslog server going to capture the log and i could come back here with that info. worst case scenario i just remove the two new sticks of ram i installed and go with the old setup that didnt have an issue. let me know if that sounds like a good plan of action. thanks yall

 

Double Edit//

 

so the system just crashed... here are the logs from the server

syslog.log

Edited by delgadot2040
update
Link to comment
6 minutes ago, trurl said:

Don't see it in those after reboot.

is it possible that the 4 sticks of ram messed with the communication to the disk you mentioned?

 

the server seems to be running ok with the original two ram sticks.  i might get some bigger sized pair of sticks to replace these and to avoid installing 4.  any advice?

Link to comment

Same ata4 resets you had earlier. Seems to be referring to disk3. Check connections.

 

Shouldn't cause crash though. Something else is probably going on.

 

Usual advice is to boot in SAFE mode with Docker and VM Manager disabled and let it run like that for a while to see if it still crashes.

Link to comment
14 minutes ago, trurl said:

Same ata4 resets you had earlier. Seems to be referring to disk3. Check connections.

 

Shouldn't cause crash though. Something else is probably going on.

 

Usual advice is to boot in SAFE mode with Docker and VM Manager disabled and let it run like that for a while to see if it still crashes.

could this have something to do with that drive being a WD white label that was shucked and it has a 3.3v pin?  disk 2 was the same but i put tape over the third pin after it was not recognized on initial install. the tape solved that drive. when i put in disk 3, also a shucked white label, it was recognized immediately without having to cover that pin so i never did. perhaps if i remove the drive add the tape on the pin like the other it will function as intended.

Link to comment

so an update

 

after putting tape on the third pin of the power input on disk3 things have stayed up now for 19 hours. I have been running my vm for gaming, running torrents, plex, and doing a parity check. no crashes knock on wood.

 

im going to assume the problem was the WD white label with that power pin.  just wanted to come back and let yall know incase anyone else runs into this in the future.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...