Mandersoon Posted July 29, 2020 Posted July 29, 2020 Hey there! My machine had been running pretty stable since my last post, just had a hard freeze or two since (which I can stomach if it's once every other month or something). Ever since yesterday, I've started having hard freezes within a couple hours of me turning on the machine and I can't discern why. I have attached both diagnostics and the syslog I pulled from my flash drive. I can't make heads or tails of anything specific on reboot and there aren't any errors that the syslog is recording prior to the hang so I'm at a loss Any help would be appreciated!! I'm running several docker instances but none of them auto-update, so there had been no configuration changes at all for almost a week prior. I also updated my docker containers to see if that'd help, as well as disabling some of them but still happened sometime last night. One of my friends mentioned that it crashed seemingly around the same time he watched a particular episode on Plex, but after rebooting and trying to watch that episode neither of us had an issue afterwards, so unclear if that's related. General specs: 3600X, 64GB of RAM (was running at 2666 XMP but also tried turning off XMP and running at 2133 but same thing), ROG Strix B450-F Gaming flashsyslog.log beeg-box-diagnostics-20200728-1826.zip Quote
Mandersoon Posted August 3, 2020 Author Posted August 3, 2020 As a slight update, it seems to only do it occasionally (got really unlucky at time of posting because it had been doing it soon after boot), the server had been mostly fine over the week but it's happened twice today. It "recovered" in that it rebooted this time, but I have no clue whatsoever as to how to troubleshoot this/isolate what the issue is given that syslog says basically nothing. Quote
JorgeB Posted August 3, 2020 Posted August 3, 2020 Make sure it's using the correct "power supply idle control setting", more info here. Quote
Mandersoon Posted August 3, 2020 Author Posted August 3, 2020 Yeah I read through all of that when I was initially running into issues - my motherboard doesn't have that setting visible anywhere in BIOS that I can recall, though I can try double-checking again later tonight. I just find it odd that it went from pretty stable to practically unusable over the course of two days with no configuration changes. Quote
JonathanM Posted August 4, 2020 Posted August 4, 2020 9 hours ago, Mandersoon said: I just find it odd that it went from pretty stable to practically unusable over the course of two days with no configuration changes. That statement makes me think hardware failure. Are you sure your cooling is ok? Heatsink came loose, fan not working, ambient temp in the room changed? Quote
Mandersoon Posted August 4, 2020 Author Posted August 4, 2020 Yeah it definitely smells like something hardware-related. As far as I can tell nothing temperature-wise has changed substantially. The ambient temp near the server is a bit high, but not enough for it to be anywhere close to thermal shutdown or anything like that. When the 1700X was in that machine, it was peaking at about 80C after a sustained Plex transcode load. I can't check thermals while booted into Unraid, but the couple times that I've rebooted and immediately looked at BIOS after reboot it's usually been in upper 30s/low 40s (which I expect given ambient temps). I was talking to a friend earlier who had the same motherboard I have (ROG Strix B450-F Gaming) and he mentioned that he swapped it out as he was getting reboots every couple weeks (which I was also having) so there's a possibility there's some incompatibility there but so far it's unsubstantiated. There was a new BIOS update published last month so I updated to its latest version today (3103 at time of writing) and started it back up/letting parity check run its course. I had the machine open at the time and it looked like all fans were operating & the heatsink was latched on appropriately. I'm thinking I might run a memtest again if it crashes/reboots again to see if it's RAM or something but if it ends up being the motherboard at fault I'm worried that memtest might pick up errors that are caused by the mobo instead of RAM. Quote
Mandersoon Posted August 8, 2020 Author Posted August 8, 2020 😭😭😭 Well I thought it was going well since the update on Monday but it just had another random reboot this morning with no errors in logs yet again. I'm running a memtest now to see if anything is funky there but I guess I should start looking at ordering a different motherboard or something - I hate just ordering hardware in the hopes that it'll fix issues but I don't really have a choice here at this point. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.