Jump to content

System keep hanging or rebooting


Teknowiz
Go to solution Solved by Teknowiz,

Recommended Posts

For past few days my Unraid server keeps hanging or rebooting on its own. When it hangs, I can't access it locally or over web browser and need to force shutdown. Tried recreating flash drive and copying over old config and issue persists. Already tried one run of mem test and seemed to pass. Updated BIOS to latest version as well but issue still occurring. Issue seems to have started after installing new Parity drive and swapping old parity to data drive. System seem to run fine while running mem test or with array not started but after starting array seems to freeze within 30-60 min or reboots on its own. Syslog entries don't seem to persists thorough the reboot. Attached is the diagnostics log, would appreciate any help in narrowing down cause of the problem.

 

Edited by Teknowiz
Removed diag since it didn't help.
Link to comment

It intermittently does both, sometimes non responsive and needing a forced reboot and other times reboots on its own. Seems to happen when array is brought online. I did enable the local syslog. Running another round of memtest86+ and so far after 3 rounds no errors. I guess could be faulty CPU or Power but its odd only happens when array is online.

Link to comment

I may have narrowed the issue down to Plex server official docker being the issue or possibly it combined with AMD igpu transcoding. Disabled the Plex container and server was up 12+ hours which is longer than I had managed before since start of the issue. Will see if other Plex containers are more stable or temporarily keep Plex disabled to see if system is back to full stability.

  • Like 1
Link to comment

Well it looks like Plex docker was not source of the issue. Instead of hourly freeze or reboot now it seems to last 6-8 hours and then still hangs or reboots. Getting frustrated with this issue with logs not showing any source of the problem and constant crashes causing lot of parity data issues but not able to complete parity check to repair before another crash.

 

Any ideas on enabling any other logging or troubleshooting. I have tried memtest and it passed, tried disabling DOCP memory profile so it lowers the RAM speed back to stock, disabled C-STATE completely in Bios but issue still persists. I even removed the new parity drive installed last week but still had same issue. 

Edited by Teknowiz
Link to comment

Did 3 more passes of Mem test and all past. Ran CPU stress test for 4 hours from ubuntu live CD with 100% CPU load and all passed.

Restarted the system after testing to Unraid in safe mode and with Docker and VMs all disabled and system hung again after short time of parity check.

Not sure what else to test hardware wise anymore. Attached is another diagnostics log in case anyone is able to identify the source of the stability.

 

Edited by Teknowiz
Removed diag since it didn't help.
Link to comment
45 minutes ago, Teknowiz said:

Mirror option was already enabled but didn't seem to capture crash details on diagnostics log unless it gets stored somewhere else on the flash. It crashed again also after enabling remote syslog but remote log entries showed nothing out of ordinary prior to crash, just bunch of kernel messages regarding usb connect disconnect events for the mouse.

 

 

image.png

The file is in the ‘logs’ folder on the flash drive.

Link to comment

Unfortunately there's nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment

Already tried running with bare minimum in safe mode with no docker or vm service enabled and still crashes. Odd thing is no crash or errors while running memtest or in ubuntu while doing CPU stress test. I would think hardware issue would cause crash in any boot environment. More logs attached from more crashes over nigh but this time there seems to be some errors that I don't recognize. I wonder if its my HBA card or a corrupted file system that could be causing the issue.

 

 

 

Edited by Teknowiz
Link to comment
  • 2 weeks later...

System still unstable with no useful info on logs even when stored in flash. Recreated install form scratch on fresh USB without even backing up config and still crashes. Tried new power supply same result. Disconnected LSI HBA card and all hard drives and still crashes. Ran CPU stress test from Ubuntu live CD for 24+ hours at 100% CPU load without single crash. Ran Memtest86+ for 3+ days without any errors being detected. Still no cluse what issue could be as all feasible hardware issues seem to be ruled out at this point. Contemplating taking out the 2 NVME drives next to see if that helps.

 

Any suggestions what else to try. My only other alternative is get new intel cpu and board at this point.

Link to comment
  • Solution

Well seems I finally manged to stabilize the system by changing following settings using Tips and Tweaks addon:

 

Disable NIC Flow Control? Yes

Disable NIC Offload?Yes

Normal CPU Scaling Governor: Performance

 

Previously I had NIC related settings at default and Normal CPU Scaling Governor set to on demand but now with those settings system becomes unresponsive randomly. Not sure which of the 3 settings is actually causing the issue as disabling C states in bios and just disabling NIC related settings or setting docker lan network type to ipvlan don't seem to fix the problem. Also odd system ran for long time with those settings without issues and with latest build of Unraid it becomes unstable.

 

At lest system seems to be functional again without needing new hardware.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...