Magma Posted May 4, 2018 Share Posted May 4, 2018 (edited) I have only recently started using Unraid as of 2 weeks ago. Currently version 6.5.1. The entire time I have had random crashing. Sometimes multiple times a day. The longest uptime was just over 3 days. I'm only running Dockers for Plex, Sonarr, Radarr, sabnzbd, and nzbHydra2. I'm running it on an i7 970 with a EVGA X58 FTW3 motherboard with the latest bios. The temperatures were somewhat high so yesterday I replaced the stock heat sink with something better. This greatly improved the CPU temps but the crashing appeared to increase in frequency. Possibly just a coincidence. Today I went home during lunch and saw that it had crashed again. There is never any error message on the screen. This time however when I went to reboot it it hands on loading bzroot. I left it running memtest to see if detects any errors. I have installed the Fix Common Problems plugin and have attached the latest logs generated from troubleshooting mode. If anyone can provide any insight it would be greatly appreciated. I'm at a total loss currently. FCPsyslog_tail.txt tower-diagnostics-20180504-1120.zip Edited May 9, 2018 by Magma Added more details. Quote Link to comment
Magma Posted May 4, 2018 Author Share Posted May 4, 2018 Got home. Memtest had completed 2 passes with 0 errors. Reset bios settings to default and managed to boot this time. Going to be running a memtest overnight later. Quote Link to comment
Magma Posted May 7, 2018 Author Share Posted May 7, 2018 Unfortunately I'm still at a loss. Memtest overnight provided 0 errors. It lasted all day but crashed somewhere between 2:23 and 2:53 in the morning. I have attached the syslog and diagnostics again. Not exactly sure what else to be looking for honestly. One of my drives displayed read errors during a parity check. Nothing is stored on that drive presently however and is typically spun down. Could that be causing the crashes? tower-diagnostics-20180507-0223.zip FCPsyslog_tail.txt Quote Link to comment
JorgeB Posted May 7, 2018 Share Posted May 7, 2018 Disk2 needs to be replaced, and yes, it may cause the server to crash, or more likely unresponsive and appear to be crashed. Quote Link to comment
Magma Posted May 9, 2018 Author Share Posted May 9, 2018 Another update. I have removed the dying hard drive and have replaced the ram which has passed overnight memtests. The crashing continues. It appears to be tied to high cpu usage. It happened when sabnzbd was unpacking at 30 gb download and it happens nightly when PLEX does it's scheduled maintenance. I rebooted today and started troubleshooting mode. I then started the scheduled maintenance and after about 30 mins it all locked up. Temperatures were only at 38 C. I have attached the latest diagnostics and syslog tail from this crash. I have a new 600w PSU with a 50a single +12v rail on the way. Any other suggestions on what could possibly be the issue or what I could try in the meantime? FCPsyslog_tail.txt tower-diagnostics-20180509-1148.zip Quote Link to comment
JorgeB Posted May 9, 2018 Share Posted May 9, 2018 Try running in safe mode for a while, and will dockers/VMs stopped. Quote Link to comment
Magma Posted May 9, 2018 Author Share Posted May 9, 2018 Ok I will give that a shot. How long should I run it in safe mode? What exactly will this determine though? The server would essentially be idling with all of the drives spun down. Quote Link to comment
John_M Posted May 9, 2018 Share Posted May 9, 2018 15 minutes ago, Magma said: What exactly will this determine though? The server would essentially be idling with all of the drives spun down. You're looking for stability. If it's stable in this basic operating mode you can start re-enabling things (just one at a time, ideally) to see if any of them cause it to fall over. It's much easier to find the culprit this way. Quote Link to comment
JorgeB Posted May 9, 2018 Share Posted May 9, 2018 21 minutes ago, Magma said: The server would essentially be idling with all of the drives spun down. You can still use as a basic NAS. Quote Link to comment
Magma Posted May 29, 2018 Author Share Posted May 29, 2018 I am at a complete loss so here's an update. I disabled all of my dockers and enabled them one at a time. Crashes were still happening though not as often. The event that most closely coincided with the crashes was plexs overnight maintenance although it wasn't uncommon for there to be no crash. I have also had it crash during the middle of the day with no real active usage. My longest uptime was 2.5 days. I have replaced the power supply, and ram. It has passed overnight memtest and prime95 tests. I have tried only using 1 stick of ram at a time and swapping them out. My only guess is that I need to replace the motherboard which I would prefer to avoid because at that point I should probably just replace the ram and cpu too. I have attached the most recent syslog and diagnostics from the last crash. FCPsyslog_tail.txt tower-diagnostics-20180528-1112.zip Quote Link to comment
BobPhoenix Posted June 2, 2018 Share Posted June 2, 2018 Two suggestions for you - first is most likely and I had nasty problems with it but I've had problems with the second before as well. Turn off in bios if on MB or remove the Marvel 88SE9123 controller card - I had my Marvel 9230 controller passed through to a VM and got dropped drives and had to reboot the server to get them back. Didn't cause unRAID crashes for me but if you are using array drives on yours this is likely your problem. I turned my MB 9230 off in the bios so I wouldn't be tempted to use it and haven't had any problems with that server any more. Turn off in bios if on MB or remove the NEC USB 3.0 controller card. Since I stopped using my Fresco USB 3.0 card in another server it has been up for 15 days and I believe it would have been longer but I had to reboot for an unRAID upgrade. Before that I got random crashes that I couldn't figure out with it installed. Really think the 1st one is most likely the cause but doing either or both of the above is where I would start to trouble shoot since you have already tried some other hardware changes. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.