Jump to content

Constant issues with server I've run for 8+ years, started recently


Go to solution Solved by JorgeB,

Recommended Posts

Posted

Hello all, I started with my first unraid server back in college, probably 8 or 9 years ago now and for the most part I haven't had any issues. In the last year or so I upgraded my server from an intel j1900 mobo/cpu combo to a ryzen 5600g, gigabyte mobo, and I upgraded from using multiple pcie -> sata cards to a single 9200-8i hba. At this time i also built a second unraid server that was identical to the first one hardware wise, but i used a 9300-8i instead of the 9200, server 2 has all ssd storage and server 1 has 8 spinning disks, plus a 3 ssd cache pool, plus a single ssd cache drive for the main array. Both servers also have an intel 10g network card, an x540-t1

mobo's purchased: https://www.newegg.com/gigabyte-b550m-ds3h-ac/p/N82E16813145250?Item=N82E16813145250
processors purchased: https://www.newegg.com/amd-ryzen-5-5600g-ryzen-5-5000-g-series/p/N82E16819113683?Item=N82E16819113683

I started having issues where the servers would go unresponsive for no apparent reason, they were still running but they weren't reachable by ssh or the web interface and i had to cut power and restart them to come back online. I let the parity check run each time this happened and while very annoying, there was seemingly no loss to my data so i continued to investigate the root cause. Some forums lead me to think it was an issue with c states with my ryzen processors so I upgraded the bios and disabled c states. For the ssd box this seemed to fix the issue, but on my primary storage box (server 1), the issues have persisted. Over the next couple of weeks, I replaced the power supply, ram, and changed out the 9200-8i to an extra 9300-8i that I had laying around but nothing seemed to fix the issues. I also set up a syslog server and a prometheus exporter / grafana ui to monitor the box, but the syslog server never had any errors in the logs and the grafana charts just dropped off a cliff with nothing indicating any issues. Finally i decided to perform an unraid os upgrade and i saw in the channel log for 9.12.9 they recommended changing from macvlan to ipvlan to resolve some issues, so after the upgrade I tried that and it seems to have mostly fixed the problem where it would go unresponsive without warning.

I had the server go unresponsive again after 2 days when the log directory filled up and the cleanup process failed, so i had to do another hard reboot and it behaved for a week or two. I just had a power outage for about 30 seconds and the box decided to shut itself down (i've tried changing the settings multiple times but server1 always instantly shuts down on a power outage, server2 doesn't have this issue, both are on individual ups'), and during shutdown it encountered another error and froze up, necessitating another hard restart. I just started it up again and it disabled disk 1, it seems like this means disk 1 is out of sync with the rest of the array somehow and i'll need to rebuild the array. Server 1 is mostly cold storage, it holds bulk files, backups, movies, that kind of thing and nothing that i know of was writing to the array when it shut down. During the failed shutdown attempt it produced a log dump which i've attached here, we have some bad weather in my area so i've decided to shut the box down until i can guarantee the power is stable at which point i plan to rebuild the array, but i'm really getting annoyed at how frequently this box has issues.


Could someone smarter than me look at the logs and see if there's any indication of what's going wrong? I'm half tempted to start over with a fresh install or move over to a different nas os at this point. I have backups of all the data, when the servers first started having issues I bought an 18tb seagate drive and copied the whole server over to it, or at least everything I care about. 

tower-diagnostics-20240412-1806.zip

Posted (edited)
17 hours ago, dwilson2547 said:

they were still running but they weren't reachable by ssh or the web interface and i had to cut power and restart them to come back online.

Force powe off or shutdown by power button ? You should attach monitor and keyboard to verify does network issue only.

Edited by Vr2Io
Posted
1 hour ago, Vr2Io said:

Force powe off or shutdown by power button ? You should attach monitor and keyboard to verify does network issue only.

I should've put it in the post but I have been doing that, first time it happened i tried to connect a monitor and keyboard but the monitor didn't recognize an input source, the keyboard didn't even light up. After rebooting the monitor and keyboard worked as expected, so I left them plugged in. Next time it happened i went to check on it and again the keyboard lights were off and the monitor didn't register an input source. Same behavior on both boxes 

Posted (edited)
2 hours ago, dwilson2547 said:

Same behavior on both boxes 

 

Both box got same behaviour ? This strange.

 

In general, display will blank due to saving suspension, but once press any key, monitor output will resume.

 

For immediate shutdown behaviour on power outage (different UPS), I don't thing this related, it is because power unstable won't cause crash, system will just reboot or keep in power off.

Edited by Vr2Io
  • 2 weeks later...
Posted (edited)
On 4/13/2024 at 3:57 AM, JorgeB said:

This seems to have been the solution, I had c stated disabled globally and xmp disabled, but i was missing the power supply idle control set totypical current idle, i've been running for a while without issue now 

Edited by dwilson2547
  • Like 1

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...