Unraid becomes unresponsive intermittently. Has happened 3 times now (now with logs)


Recommended Posts

I made this post last week and since I can't access the server at all without rebooting, the logs I had weren't helpful.

 

So I followed the recommendation and turned on syslog mirror, which I've attached to this post. I had to prune them because they were too large. Let me know if I need to add anything else back in.

 

As you can tell via the logs I had to power cycle the system at roughly 15:41 on July 26th. Right before then I was unable to access unraid either via remote login, pinging, or with a monitor hooked up directly to the machine.

 

Problem though is that there's no errors logged immediately prior to that so not sure if this is going to give us any info to help diagnose it. I had a recycle bin plugin which I've disabled just in case that was causing the issue. Here's all my details:

 

Version: 6.9.2

 

I'm running the following:

msi z490-a pro

Intel 10850k

Supermicro AOC-S3008L-L8E 12Gb/s 8-Port HBA IT-Mode Controller SAS

IO Crest SATA III 4 Port PCI-e 2.0 x 2 Controller Card with Low Profile Bracket (SI-PEX40062)

Cooler Master V700 - 700W Power Supply with Fully Modular Cables and 80 PLUS Gold Certification

Team T-FORCE VULCAN Z 64GB (2 x 32GB) 288-Pin DDR4 SDRAM DDR4 3200 (PC4 25600) Intel XMP 2.0 Desktop Memory Model           TLZGD464G3200HC16CDC01

 

CPU is running base clock, Memory has XMP profile enabled.

 

At the time it crashed I had the following dockers running:

binhex-nzbget

prowlarr

binhex-qbittorrent

binhex-radarr

binhex-sonarr

machinaris

 

Additionally I had a windows 10 VM running.

 

At the time of crash, no parity check was running.

 

I have the fix common problems plugin installed and it had no items although I'm running an extended test now to see if anything pops up.

 

Any recommendations on how to diagnose? Thanks in advance.

 

syslog-192.168.50.109 pruned.log

Edited by Woogz
  • Like 1
Link to comment

Your issue seems to be very similar to mine. I am also seeing nothing before the freeze requiring a hard reset. For me it has only happened with 6.9.x. I rolled back to 6.8.3 and the problem was gone. I thought it might have been my fault because I did not change the go file for plex, but even with the go file changes it is still restarting. Might go back to 6.8.3 again.
 


 

Link to comment
6 hours ago, Tristankin said:

Your issue seems to be very similar to mine. I am also seeing nothing before the freeze requiring a hard reset. For me it has only happened with 6.9.x. I rolled back to 6.8.3 and the problem was gone. I thought it might have been my fault because I did not change the go file for plex, but even with the go file changes it is still restarting. Might go back to 6.8.3 again.
 


 

Thanks for taking the time to reply. If the issue is kernel panic it sounds like it's something they're addressing in version 6.10? My first version was 6.9.2 so I'm not sure if I can even downgrade. If it continues to be a problem without a solution I'll look to downgrade to 6.8.x. Thanks!

Link to comment

Yeah, it is a bit hard to tell if the kernel panic isn't even appearing. I have tried turning off all C states in the BIOS and underclocking the ram since the last reboot. Not sure what else to try without taking the server offline for days turning off dockers and doing memory tests, I have a lot of family and friends relying on the uptime. :)

Link to comment

I did extensive memory tests prior to setting up my server so I feel pretty good that's not the cause, at least for me. I may try turning off XMP though if it happens again. Thing is I'm not even running that much and since the issue is intermittent then turning off all dockers except 1 and waiting for it to crash I may be waiting weeks. Frustrating situation for sure.

 

If you end up finding a culprit or a solution be sure to let me know and I'll do the same.

Link to comment

You should definitely try turning off XMP until you have your server in a stable state.  The problem with any over-clocking is that it is impossible to predict with any certainty when it might cause a failure regardless of tests you do in advance.

 

it seems that servers are more prone to this sort of issue than desktops, but I suspect it is just more noticeable as they tend to be left running 24x7.

 

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.