Teknowiz Posted September 17 Share Posted September 17 For past few days my Unraid server keeps hanging or rebooting on its own. When it hangs, I can't access it locally or over web browser and need to force shutdown. Tried recreating flash drive and copying over old config and issue persists. Already tried one run of mem test and seemed to pass. Updated BIOS to latest version as well but issue still occurring. Issue seems to have started after installing new Parity drive and swapping old parity to data drive. System seem to run fine while running mem test or with array not started but after starting array seems to freeze within 30-60 min or reboots on its own. Syslog entries don't seem to persists thorough the reboot. Attached is the diagnostics log, would appreciate any help in narrowing down cause of the problem. tower-diagnostics-20230916-2302.zip Quote Link to comment
JorgeB Posted September 17 Share Posted September 17 If it's also rebooting, vs just crashing or hanging, it suggests a hardware problem, you can enable the syslog server and post that after a crash to see if there's something there. Quote Link to comment
Teknowiz Posted September 17 Author Share Posted September 17 It intermittently does both, sometimes non responsive and needing a forced reboot and other times reboots on its own. Seems to happen when array is brought online. I did enable the local syslog. Running another round of memtest86+ and so far after 3 rounds no errors. I guess could be faulty CPU or Power but its odd only happens when array is online. Quote Link to comment
Teknowiz Posted September 17 Author Share Posted September 17 (edited) Updated diagnostics files below. System crashed few hours after starting array. Hopefully syslog files will be useful. Seems another user facing similar issue with similar cpu. https://forums.unraid.net/topic/145280-system-becomes-unresponsive-after-a-few-hours/ tower-diagnostics-20230917-1415.zip Edited September 17 by Teknowiz Quote Link to comment
Teknowiz Posted September 17 Author Share Posted September 17 And now another reboot on its own even without array being started. tower-diagnostics-20230917-1423.zip Quote Link to comment
JorgeB Posted September 18 Share Posted September 18 21 hours ago, JorgeB said: enable the syslog server and post that after a crash Syslog in the regular diags starts over after every reboot, so not much to see. Quote Link to comment
Teknowiz Posted September 18 Author Share Posted September 18 I may have narrowed the issue down to Plex server official docker being the issue or possibly it combined with AMD igpu transcoding. Disabled the Plex container and server was up 12+ hours which is longer than I had managed before since start of the issue. Will see if other Plex containers are more stable or temporarily keep Plex disabled to see if system is back to full stability. 1 Quote Link to comment
Teknowiz Posted September 19 Author Share Posted September 19 (edited) Well it looks like Plex docker was not source of the issue. Instead of hourly freeze or reboot now it seems to last 6-8 hours and then still hangs or reboots. Getting frustrated with this issue with logs not showing any source of the problem and constant crashes causing lot of parity data issues but not able to complete parity check to repair before another crash. Any ideas on enabling any other logging or troubleshooting. I have tried memtest and it passed, tried disabling DOCP memory profile so it lowers the RAM speed back to stock, disabled C-STATE completely in Bios but issue still persists. I even removed the new parity drive installed last week but still had same issue. Edited September 19 by Teknowiz Quote Link to comment
JorgeB Posted September 19 Share Posted September 19 One thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
Teknowiz Posted September 19 Author Share Posted September 19 Thanks, will try that. Also potentially planning to boot into like a Linux live CD and run some CPU diagnostics. Quote Link to comment
Teknowiz Posted September 20 Author Share Posted September 20 Did 3 more passes of Mem test and all past. Ran CPU stress test for 4 hours from ubuntu live CD with 100% CPU load and all passed. Restarted the system after testing to Unraid in safe mode and with Docker and VMs all disabled and system hung again after short time of parity check. Not sure what else to test hardware wise anymore. Attached is another diagnostics log in case anyone is able to identify the source of the stability. tower-diagnostics-20230920-1414.zip Quote Link to comment
JorgeB Posted September 20 Share Posted September 20 Syslog in the diags starts over after every reboot, so not much to see. Quote Link to comment
Teknowiz Posted Wednesday at 07:13 PM Author Share Posted Wednesday at 07:13 PM I enabled remote syslog and started logging to my desktop, will see if it records anything on the next crash/hang. I wish Unraid would store logs on flash so it would survive the reboot. Quote Link to comment
itimpi Posted Wednesday at 07:15 PM Share Posted Wednesday at 07:15 PM Just now, Teknowiz said: . I wish Unraid would store logs on flash so it would survive the reboot. It will if you enable the Mirror to flash option in the syslog server. Quote Link to comment
Teknowiz Posted Wednesday at 09:08 PM Author Share Posted Wednesday at 09:08 PM Mirror option was already enabled but didn't seem to capture crash details on diagnostics log unless it gets stored somewhere else on the flash. It crashed again also after enabling remote syslog but remote log entries showed nothing out of ordinary prior to crash, just bunch of kernel messages regarding usb connect disconnect events for the mouse. Quote Link to comment
itimpi Posted Wednesday at 09:54 PM Share Posted Wednesday at 09:54 PM 45 minutes ago, Teknowiz said: Mirror option was already enabled but didn't seem to capture crash details on diagnostics log unless it gets stored somewhere else on the flash. It crashed again also after enabling remote syslog but remote log entries showed nothing out of ordinary prior to crash, just bunch of kernel messages regarding usb connect disconnect events for the mouse. The file is in the ‘logs’ folder on the flash drive. Quote Link to comment
Teknowiz Posted Thursday at 12:30 AM Author Share Posted Thursday at 12:30 AM Got this so far on remote syslog server up to system becoming non responsive again. Don't seem to have much to go on it other than samba errors.. syslog.txt Quote Link to comment
Teknowiz Posted Thursday at 01:10 AM Author Share Posted Thursday at 01:10 AM Flash syslog seems to have logged data but nothing sticks out to me as source of the instability. syslog Quote Link to comment
JorgeB Posted Thursday at 08:14 AM Share Posted Thursday at 08:14 AM Unfortunately there's nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
Teknowiz Posted Thursday at 11:31 AM Author Share Posted Thursday at 11:31 AM Already tried running with bare minimum in safe mode with no docker or vm service enabled and still crashes. Odd thing is no crash or errors while running memtest or in ubuntu while doing CPU stress test. I would think hardware issue would cause crash in any boot environment. More logs attached from more crashes over nigh but this time there seems to be some errors that I don't recognize. I wonder if its my HBA card or a corrupted file system that could be causing the issue. syslog.txt Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.