mnz88 Posted January 19, 2017 Share Posted January 19, 2017 I have been battling off and on with my unRAID box randomly crashing every day or two for the past year or so. I'm at my wit's end, because I can't seem to figure out what triggers it. Sometimes it crashes when I start or stop the array, sometimes it crashes overnight. At first, I thought the issue was my NIC because the server crashing would sometimes lock up my entire network. I installed a new Intel NIC and disabled the onboard NIC in the BIOS. I've run 24 hours worth of memtests with no errors. I even ran the system with Windows Server installed for several days (and saw no issues). I've installed the "Fix Common Problems" plugin, and it doesn't come up with anything. During this most recent crash, I managed to pull a diagnostic off of it before it went down. In addition, I had a syslog tail running to the attached text file. If somebody could help me out with this, I'd be eternally grateful... eternally. I've spent countless hours trying to troubleshoot this. Thanks! mnz-serv-diagnostics-20170118-1950.zip syslogtail2.zip Quote Link to comment
RobJ Posted January 19, 2017 Share Posted January 19, 2017 Just a brief examination, but you have 3 kernel issues just in the tail. Jan 18 18:17:01 mnz-serv kernel: perf interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 Jan 18 18:55:07 mnz-serv kernel: kernel BUG at fs/buffer.c:3339! Jan 18 18:55:07 mnz-serv kernel: invalid opcode: 0000 [#1] PREEMPT SMP Jan 18 19:37:01 mnz-serv kernel: general protection fault: 0000 [#2] PREEMPT SMP That's obviously not good! 3 recommendations: - Your BIOS is fairly recent, but keep checking for newer ones - Make sure your BIOS is set to stock values, no overclocking or non-standard values - Upgrade to the latest unRAID, v6.3.0-rc6, it has a somewhat newer kernel Quote Link to comment
mnz88 Posted January 19, 2017 Author Share Posted January 19, 2017 I just updated the BIOS about a week ago. I'm fairly certain that all BIOS settings are stock, but I'll take another look through the manual and make sure nothing looks wonky. I just upgraded to v6.3.0-rc6, so we'll see if that helps at all. Thanks! Quote Link to comment
mnz88 Posted January 26, 2017 Author Share Posted January 26, 2017 Well, I'm still seeing the random crashes. I am using the Fix Common Problems "Troubleshooting Mode", but it seems that at some point it stopped logging to the Syslog file in its directory. This time I could still ping the server but couldn't see anything from the console. Nor could I access the server through SSH or the web interface. At least it didn't take down my entire network this time. I'm attaching the last diagnostic log it took if that helps anybody. I'd really appreciate it if somebody could help me out with this. Thanks! mnz-serv-diagnostics-20170126-1341.zip Quote Link to comment
Squid Posted January 27, 2017 Share Posted January 27, 2017 Does anything appear on the local monitor (if you have one) when it crashes? You might want to upload the syslog.txt that FCP generated as diagnostics are created every 30 minutes, but the syslog will go right to when FCP stopped due to the crash. Also, have you run memtest for at least a pass or two? By and large outright crashes of unRaid are purely hardware related. But, compounding that is that not all BIOS's / firmware are created equally, and VM's do have the ability to outright crash the system if the BIOS is buggy. Quote Link to comment
Can0n Posted August 9, 2018 Share Posted August 9, 2018 one of my servers has also been crashing about once every two days or so attached is my latest diagnostics the last crash was about 2 hours ago at the time of this posting it always seems to do this late at night so not sure where the failure could be....I know there is an updated BIOS but have not loaded it yet for lack of time.....its been stable for a long time until recently, the only change is a bad stick of ram the BIOS logs were getting full (Acer AR380 R2 2U server with IPMI) i pulled that stick and it seemed to run better for a short while but for last month its stating unclean shut down detected and runs a parity check. Server logs in IPMI show no shutdowns or reboots this is not appearing to be a hardware issue at least not one that the IPMI module monitors my other server is pretty stable... the one crashing hosts the following VM's DNS ad blocker (ubunutu and Pihole), Fedora reverse proxy and a windows 10 vm so its kind of important hoping these diagnostics can help sif-diagnostics-20180809-0752.zip Quote Link to comment
kricker Posted August 13, 2018 Share Posted August 13, 2018 Quote I am having similar issues witbh a server that used to be rock solid for years. Now the crashes are random, and I can't get a diagnostic log during the crash. It is quite frustrating. If I run Fix Common Problems in troubleshooting mode it never crashes. Quote Link to comment
Can0n Posted August 18, 2018 Share Posted August 18, 2018 (edited) I think I have found mine crashes with two or more vm’s running I don’t get why though I have 24 threads and 50GB ram my fedora reverse proxy gets 8 threads and 20GB ram and my windows 10 gets 4 different threads and 8GB ram that leave plenty of threads and ram for unraid i had a full two days uptime when I booted my windows 10 VM for the first time in a week then had one unraid crash followed by a full on hard freeze of the server about 50% of way doing parity check. I even installed handbrake and encoded a 10GB 1080p video to 720p which taxed the cpus hard and no lock ups so hoping it was just that ram I checked ky server logs from the BMC/IPMI card and saw yet another ram module failing (that’s now two 4GB modules). I have new ram ram and two faster cpus on the way can’t wait to see if that helps. Edited August 18, 2018 by Can0nfan Quote Link to comment
sublime24 Posted October 15, 2021 Share Posted October 15, 2021 Hello, Im currently having the same issue with my unraid server. Things were fine for more than a year. I have created another topic but thought it maybe better to piggyback off of this thread. Model: Custom M/B: Gigabyte Technology Co., Ltd. AX370-Gaming K7 Version Default string - s/n: Default string BIOS: American Megatrends Inc. Version F50d. Dated: 06/16/2020 CPU: AMD Ryzen 7 1800X Eight-Core @ 3600 MHz HVM: Enabled IOMMU: Enabled Cache: 768 KiB, 4 MB, 16 MB Memory: 64 GiB DDR4 (max. installable capacity 128 GiB) Network: bond0: IEEE 802.3ad Dynamic link aggregation, mtu 1500 eth0: 1000 Mbps, full duplex, mtu 1500 eth1: 1000 Mbps, full duplex, mtu 1500 Kernel: Linux 5.10.28-Unraid x86_64 OpenSSL: 1.1.1 kaching-diagnostics-20211015-1136.zip syslog (2) Quote Link to comment
ChatNoir Posted October 16, 2021 Share Posted October 16, 2021 11 hours ago, sublime24 said: it maybe better to piggyback off of this thread Probably better to stay on your own topic. This one is quite old, the software changed a lot and I doubt your hardware is similar to the OP. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.