dwilson2547 Posted April 12, 2024 Posted April 12, 2024 Hello all, I started with my first unraid server back in college, probably 8 or 9 years ago now and for the most part I haven't had any issues. In the last year or so I upgraded my server from an intel j1900 mobo/cpu combo to a ryzen 5600g, gigabyte mobo, and I upgraded from using multiple pcie -> sata cards to a single 9200-8i hba. At this time i also built a second unraid server that was identical to the first one hardware wise, but i used a 9300-8i instead of the 9200, server 2 has all ssd storage and server 1 has 8 spinning disks, plus a 3 ssd cache pool, plus a single ssd cache drive for the main array. Both servers also have an intel 10g network card, an x540-t1 mobo's purchased: https://www.newegg.com/gigabyte-b550m-ds3h-ac/p/N82E16813145250?Item=N82E16813145250 processors purchased: https://www.newegg.com/amd-ryzen-5-5600g-ryzen-5-5000-g-series/p/N82E16819113683?Item=N82E16819113683 I started having issues where the servers would go unresponsive for no apparent reason, they were still running but they weren't reachable by ssh or the web interface and i had to cut power and restart them to come back online. I let the parity check run each time this happened and while very annoying, there was seemingly no loss to my data so i continued to investigate the root cause. Some forums lead me to think it was an issue with c states with my ryzen processors so I upgraded the bios and disabled c states. For the ssd box this seemed to fix the issue, but on my primary storage box (server 1), the issues have persisted. Over the next couple of weeks, I replaced the power supply, ram, and changed out the 9200-8i to an extra 9300-8i that I had laying around but nothing seemed to fix the issues. I also set up a syslog server and a prometheus exporter / grafana ui to monitor the box, but the syslog server never had any errors in the logs and the grafana charts just dropped off a cliff with nothing indicating any issues. Finally i decided to perform an unraid os upgrade and i saw in the channel log for 9.12.9 they recommended changing from macvlan to ipvlan to resolve some issues, so after the upgrade I tried that and it seems to have mostly fixed the problem where it would go unresponsive without warning. I had the server go unresponsive again after 2 days when the log directory filled up and the cleanup process failed, so i had to do another hard reboot and it behaved for a week or two. I just had a power outage for about 30 seconds and the box decided to shut itself down (i've tried changing the settings multiple times but server1 always instantly shuts down on a power outage, server2 doesn't have this issue, both are on individual ups'), and during shutdown it encountered another error and froze up, necessitating another hard restart. I just started it up again and it disabled disk 1, it seems like this means disk 1 is out of sync with the rest of the array somehow and i'll need to rebuild the array. Server 1 is mostly cold storage, it holds bulk files, backups, movies, that kind of thing and nothing that i know of was writing to the array when it shut down. During the failed shutdown attempt it produced a log dump which i've attached here, we have some bad weather in my area so i've decided to shut the box down until i can guarantee the power is stable at which point i plan to rebuild the array, but i'm really getting annoyed at how frequently this box has issues. Could someone smarter than me look at the logs and see if there's any indication of what's going wrong? I'm half tempted to start over with a fresh install or move over to a different nas os at this point. I have backups of all the data, when the servers first started having issues I bought an 18tb seagate drive and copied the whole server over to it, or at least everything I care about. tower-diagnostics-20240412-1806.zip Quote
Zonediver Posted April 12, 2024 Posted April 12, 2024 (edited) AMD is not recommended... BIOS, RAM, Mainboard or other incompatibilities... it can be anything - take your pick Edited April 12, 2024 by Zonediver Quote
dwilson2547 Posted April 13, 2024 Author Posted April 13, 2024 1 hour ago, Zonediver said: AMD is not recommended... BIOS, RAM, Mainboard or other incompatibilities... it can be anything - take your pick Is this documented anywhere? The hardware requirements page seems to indicate AMD is fully supported https://docs.unraid.net/unraid-os/getting-started/ Quote
Solution JorgeB Posted April 13, 2024 Solution Posted April 13, 2024 Make sure this has been taken care of: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Quote
Zonediver Posted April 13, 2024 Posted April 13, 2024 9 hours ago, dwilson2547 said: Is this documented anywhere? The hardware requirements page seems to indicate AMD is fully supported https://docs.unraid.net/unraid-os/getting-started/ It is "supported" but not recommended. Quote
Vr2Io Posted April 13, 2024 Posted April 13, 2024 (edited) 17 hours ago, dwilson2547 said: they were still running but they weren't reachable by ssh or the web interface and i had to cut power and restart them to come back online. Force powe off or shutdown by power button ? You should attach monitor and keyboard to verify does network issue only. Edited April 13, 2024 by Vr2Io Quote
dwilson2547 Posted April 13, 2024 Author Posted April 13, 2024 1 hour ago, Vr2Io said: Force powe off or shutdown by power button ? You should attach monitor and keyboard to verify does network issue only. I should've put it in the post but I have been doing that, first time it happened i tried to connect a monitor and keyboard but the monitor didn't recognize an input source, the keyboard didn't even light up. After rebooting the monitor and keyboard worked as expected, so I left them plugged in. Next time it happened i went to check on it and again the keyboard lights were off and the monitor didn't register an input source. Same behavior on both boxes Quote
Vr2Io Posted April 13, 2024 Posted April 13, 2024 (edited) 2 hours ago, dwilson2547 said: Same behavior on both boxes Both box got same behaviour ? This strange. In general, display will blank due to saving suspension, but once press any key, monitor output will resume. For immediate shutdown behaviour on power outage (different UPS), I don't thing this related, it is because power unstable won't cause crash, system will just reboot or keep in power off. Edited April 13, 2024 by Vr2Io Quote
dwilson2547 Posted April 28, 2024 Author Posted April 28, 2024 (edited) On 4/13/2024 at 3:57 AM, JorgeB said: Make sure this has been taken care of: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 This seems to have been the solution, I had c stated disabled globally and xmp disabled, but i was missing the power supply idle control set totypical current idle, i've been running for a while without issue now Edited April 28, 2024 by dwilson2547 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.