unRAID Server Randomly Crashing


mnz88

Recommended Posts

I have been battling off and on with my unRAID box randomly crashing every day or two for the past year or so. I'm at my wit's end, because I can't seem to figure out what triggers it. Sometimes it crashes when I start or stop the array, sometimes it crashes overnight.

 

At first, I thought the issue was my NIC because the server crashing would sometimes lock up my entire network. I installed a new Intel NIC and disabled the onboard NIC in the BIOS. I've run 24 hours worth of memtests with no errors. I even ran the system with Windows Server installed for several days (and saw no issues). I've installed the "Fix Common Problems" plugin, and it doesn't come up with anything.

 

During this most recent crash, I managed to pull a diagnostic off of it before it went down. In addition, I had a syslog tail running to the attached text file.

 

If somebody could help me out with this, I'd be eternally grateful... eternally. I've spent countless hours trying to troubleshoot this. Thanks!

mnz-serv-diagnostics-20170118-1950.zip

syslogtail2.zip

Link to comment

Just a brief examination, but you have 3 kernel issues just in the tail.

Jan 18 18:17:01 mnz-serv kernel: perf interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

Jan 18 18:55:07 mnz-serv kernel: kernel BUG at fs/buffer.c:3339!

Jan 18 18:55:07 mnz-serv kernel: invalid opcode: 0000 [#1] PREEMPT SMP

Jan 18 19:37:01 mnz-serv kernel: general protection fault: 0000 [#2] PREEMPT SMP

 

That's obviously not good!  3 recommendations:

- Your BIOS is fairly recent, but keep checking for newer ones

- Make sure your BIOS is set to stock values, no overclocking or non-standard values

- Upgrade to the latest unRAID, v6.3.0-rc6, it has a somewhat newer kernel

Link to comment

I just updated the BIOS about a week ago. I'm fairly certain that all BIOS settings are stock, but I'll take another look through the manual and make sure nothing looks wonky.

 

I just upgraded to v6.3.0-rc6, so we'll see if that helps at all.

 

Thanks!

Link to comment

Well, I'm still seeing the random crashes. I am using the Fix Common Problems "Troubleshooting Mode", but it seems that at some point it stopped logging to the Syslog file in its directory. This time I could still ping the server but couldn't see anything from the console. Nor could I access the server through SSH or the web interface. At least it didn't take down my entire network this time.

 

I'm attaching the last diagnostic log it took if that helps anybody.

 

I'd really appreciate it if somebody could help me out with this. Thanks!

mnz-serv-diagnostics-20170126-1341.zip

Link to comment

Does anything appear on the local monitor (if you have one) when it crashes?

 

You might want to upload the syslog.txt that FCP generated as diagnostics are created every 30 minutes, but the syslog will go right to when FCP stopped due to the crash.

 

Also, have you run memtest for at least a pass or two?

 

By and large outright crashes of unRaid are purely hardware related.  But, compounding that is that not all BIOS's / firmware are created equally, and VM's do have the ability to outright crash the system if the BIOS is buggy.

Link to comment
  • 1 year later...

one of my servers has also been crashing about once every two days or so attached is my latest diagnostics the last crash was about 2 hours ago at the time of this posting

 

it always seems to do this late at night so not sure where the failure could be....I know there is an updated BIOS but have not loaded it yet for lack of time.....its been stable for a long time until recently, the only change is a bad stick of ram the BIOS logs were getting full (Acer AR380 R2 2U server with IPMI) i pulled that stick and it seemed to run better for a short while but for last month its stating unclean shut down detected and runs a parity check.  Server logs in IPMI show no shutdowns or reboots this is not appearing to be a hardware issue at least not one that the IPMI module monitors

 

my other server is pretty stable...

 

the one crashing hosts the following VM's DNS ad blocker (ubunutu and Pihole), Fedora reverse proxy and a windows 10 vm so its kind of important

 

hoping these diagnostics can help

sif-diagnostics-20180809-0752.zip

Link to comment

I think I have found mine crashes with two or more vm’s running I don’t get why though I have 24 threads and 50GB ram

 

my fedora reverse proxy gets 8 threads and 20GB ram and my windows 10 gets 4 different threads and 8GB ram

 

that leave plenty of threads and ram for unraid

 

i had a full two days uptime when I booted my windows 10 VM for the first time in a week then had one unraid crash followed by a full on hard freeze of the server about 50% of way doing parity check.  

 

I even installed handbrake and encoded a 10GB 1080p video to 720p which taxed the cpus hard and no lock ups so hoping it was just that ram

 

I checked ky server logs from the BMC/IPMI card and saw yet another ram module failing (that’s now two 4GB modules). 

 

I have new ram ram and two faster cpus on the way can’t wait to see if that helps. 

Edited by Can0nfan
Link to comment
  • 3 years later...

Hello, 

 

Im currently having the same issue with my unraid server. Things were fine for more than a year. I have created another topic but thought it maybe better to piggyback off of this thread. 

 

Model: Custom

M/B: Gigabyte Technology Co., Ltd. AX370-Gaming K7 Version Default string - s/n: Default string

BIOS: American Megatrends Inc. Version F50d. Dated: 06/16/2020

CPU: AMD Ryzen 7 1800X Eight-Core @ 3600 MHz

HVM: Enabled

IOMMU: Enabled

Cache: 768 KiB, 4 MB, 16 MB

Memory: 64 GiB DDR4 (max. installable capacity 128 GiB)

Network: bond0: IEEE 802.3ad Dynamic link aggregation, mtu 1500
 eth0: 1000 Mbps, full duplex, mtu 1500
 eth1: 1000 Mbps, full duplex, mtu 1500

Kernel: Linux 5.10.28-Unraid x86_64

OpenSSL: 1.1.1 

kaching-diagnostics-20211015-1136.zip syslog (2)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.