Unraid Becomes Unresponsive after a few days


Go to solution Solved by trurl,

Recommended Posts

A few months ago, my server also randomly became unresponsive after I upgraded the HW.

I spent a lot of time trying starting/stopping dockers, plugins, and VMs. There was no luck.

Finally, I tried memory check and saw memory errors during hours' test. After changing the memory, the issue is gone.

Link to comment

Can I stress test my CPU through Unraid? I've noticed that when the server has files written to the disks or larger tasks being done, the CPU cores will show high utilization on all cores, even those passed to VMs. I want to see if the CPU is stable as well as an extended mem test. If there isn't a stress test on Unraid, is it a stupid idea to put Windows on an external hard drive and run prime95 to ensure I don't have issues with the CPU?

Link to comment
On 1/26/2024 at 10:05 PM, T_Matz said:

I'll run a memory check again. I ran the memory check in the bios and it came up with nothing

How long did you run the mem test in BIOS?

 

You can use mem test provided in Unraid's boot menu. Usually this test runs for a few hours.

OIP-C.jpg.f744e1f16ed3b785fa81fdabaf0abf40.jpg

Link to comment

I ran it for 2 hours I'm going to run a test tomorrow for at least 24 hours. I've been able to recreate a crash now a couple of times. Any massive file transfers, or recently I had photo prisim indexing my photos and every time my cpu cores are almost all maxed and it will crash. I want to run prime 95 as well to test the CPU bios settings. I'm going to boot to a windows thumb drive and run it to see how it does as well

Link to comment
  • 2 weeks later...

I downloaded MemTest 86 v10.6 and ran my memory through numerous passes and not a single error. I tested all of the hard drives and NVMe drives and they all pass. 

 

Unraid crashed again last night. I had Sonarr getting a show had about 60 different episodes and the system crashed again. I can basically repeat this crash now when ever i want. If i have the system moving a lot of data or indexing things the CPU is all over the place and mostly pinged at 100% utilization until it crashes. I have a spare hard drive from a previous build that I will use to load windows onto and run prime95 on the server. Should I disconnect all the other drives that I use for Unraid?

Link to comment
On 1/17/2024 at 4:22 PM, JorgeB said:

you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

 

Not sure if you already did this, if not it's worth a try.

 

Link to comment

yes i did this. I booted into safe mode and slowly brought services online. The system will remain stable if i just let it run but if I try to do any intensive task it crashes. For instance i had sonarr que 60 episodes for download and it crashed. I was having photo prisim index photos and the system crashed. had radarr que movies for download and it crashed. 

Link to comment

I dont believe so, but my plan is to plug in an unused hard drive with windows 10 on it and run prime 95 to really test the CPU and cooling to see whats going on with temps and the CPU. It really seem like the only culprit, everything else has been tested and seems to be working correctly. The CPU is  a 12700k, cooler is a Noctua NH-U12S. 

Should i disconnect my other hard drives used for unraid before doing this? 

 

I will watch the temps on the CPU and ive never seen it go above 70c even during those max loads. The CPU on the unraid dashboard just shows very unusual behavior during those tasks described above. Normally my cpu sits around 4-8% load with a temp of 28-32c but when i do those tasks all or most of the CPU cores will sit at 80-100% even the ones that I have isolated for VMs.  The server becomes sluggish, if you are watching anything on plex at that time will buffer constantly and then the whole server will lock up shortly after. 

Link to comment

Quick update, so I went back into the bios after work and went to the EZ mode and noticed "AI overclock" was enabled. I disabled it and the system seemed to run stable but was only at 3600mhz instead of what is normal for 4900-5200mhz. so I decided to triy a photo prisim index and the system did not lock up.

 

So either you were correct that there was a thermal issue and the system froze or the intel turbo was just unstable. I'm still going to repaste and do a prime 95 testing. I will update again later

Link to comment

I believe the bios "AI Optomized" settings were creating an unsafe overclock. I tested the system with the setting off and have had zero issues. It seems the AI setting in the bios were stating my cooling options were more than what i had, and were boosting the CPU frequencies very high and creating unsafe temps. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.