sl0pz Posted February 4 Share Posted February 4 Had a power outage last week and the server wouldn't come back up I tried removing different parts of the server for it to post and finally got it to post after removing usb, verifying and fixing problems in usb Today we had another outage, server came back up pretty quickly, but then went back down. Server hasn't been able to stay up with regularity since the outage. I've enabled writing logs to flash drive for when it goes down next but I also got a notification 'machine check events' and to post the logs here. Can anyone take a look and see what's up? I'm guessing flash drive is on it's way out, but maybe something else is too? Bullet format - Wouldn't reboot (mid december, finally came up but don't know why) - Power outage - Checked components, fixed flash - Power outage - Rebooted, won't stay up - Logs uploaded tower-diagnostics-20240204-1535.zip Quote Link to comment
sl0pz Posted February 4 Author Share Posted February 4 (edited) Here is another log after it shutdown and I had to restart I've removed the dynamix system stats plugin syslog-192.168.0.200.log Edited February 4 by sl0pz Quote Link to comment
sl0pz Posted February 4 Author Share Posted February 4 Diagnostics from after the crash from tools>diagnostics tower-diagnostics-20240204-2147.zip Quote Link to comment
JorgeB Posted February 4 Share Posted February 4 7 hours ago, sl0pz said: but then went back down. Server shutting down on its own it's almost always hardware issue, or bad power, and unsurprisingly, there's nothing relevant logged in the syslog. Quote Link to comment
sl0pz Posted February 5 Author Share Posted February 5 For clarity, the server still has hdd lights, eth lights, and fans running when it's down. The box is seemingly still powered. It takes a hard shutdown and boot to get unraid accessible again Quote Link to comment
sl0pz Posted February 5 Author Share Posted February 5 Some chunky errors in this one hopefully will give me a good starting point! syslog-192.168.0.200.log Quote Link to comment
JorgeB Posted February 5 Share Posted February 5 There are errors withe the flash drive, it may not be the only issue, but it's an issue. Quote Link to comment
sl0pz Posted February 6 Author Share Posted February 6 Have a ups ordered and new flash drive as well, will swap over to new drive and ups will clean shutdown if there are power issues Quote Link to comment
sl0pz Posted February 12 Author Share Posted February 12 So added a ups and switched to a new flash drive, server went down again an hour later. Got some new ram, everything started up and stayed up pretty well for 12 hours, now it's down again. What's the next thing I could check or replace? Are there different logs to look at? Quote Link to comment
trurl Posted February 12 Share Posted February 12 2 minutes ago, sl0pz said: Got some new ram Did you test it? Quote Link to comment
sl0pz Posted February 12 Author Share Posted February 12 (edited) I just popped it in, booted it up, followed with a parity check. What do you think would be the most effective test? Edited February 12 by sl0pz Quote Link to comment
sl0pz Posted February 13 Author Share Posted February 13 Couldn't post with new memory, old memory, or single dimms. Ordered a new power supply as maybe it got damaged in the power outages, not really sure what it could be other than that. Quote Link to comment
sl0pz Posted February 15 Author Share Posted February 15 syslog-192.168.0.200.log Ok so new psu is having more uptime, it did crash last night it seems but only once so maybe something has helped with that. I am still having issues showing video output when posting to get to mem test however. From looking over this log I'm guessing still try to find a way to get to memtest? or is it pointing at something else now Quote Link to comment
sl0pz Posted February 16 Author Share Posted February 16 (edited) While I can get to the server via networked computer, the video output on the MB isn't working with or without the GPU plugged in. Can't access bios or video output (memtest or unraid) via the hdmi ports... edit: got video output through the gpu and running memtest Edited February 16 by sl0pz Quote Link to comment
sl0pz Posted February 16 Author Share Posted February 16 Ok memtest 4pass had zero errors (new memory from last week) I'm not sure why I can't get video output from the mb hdmi port when the gpu isn't attached, I believe I have before. I got into the bios no problem now and enabled iommu in (from auto to enabled) as I saw another person said this worked for them Removed GPU stats plugin to see if that gets rid of the log errors. Will post back if/when it crashes overnight! Quote Link to comment
sl0pz Posted February 16 Author Share Posted February 16 It doesn't seem like it crashed last night but it is still giving me a machine check events error I guess we'll continue to wait to see if it comes up again? syslog-192.168.0.200.log Quote Link to comment
sl0pz Posted February 18 Author Share Posted February 18 Ok So crashed again last night. not sure when, the last part of the log was 4pm and I found it dead at 11:25 There are no entries in the log between those two times. The nvidia log entries are gone after the removal gpustats plugin as well. So just looking for more tests to run to see why it's turning off syslog-192.168.0.200.log Quote Link to comment
JorgeB Posted February 18 Share Posted February 18 Is it still shutting down by itself or crashing/hanging? Quote Link to comment
sl0pz Posted February 18 Author Share Posted February 18 The previous crash was the latest one, I changed the bios settings to typical idle current instead of auto as I found perhaps it was dropping out from failing to come back from a low power state. Current uptime since then is 23 hours Quote Link to comment
sl0pz Posted February 21 Author Share Posted February 21 About 4 days of uptime at this point Anyone reading this thread thinking this sounds like me, my solutions were - cleaning up energy source (UPS + new power supply) - new flash drive - upgrading memory (could have been unnessecary) My guess is, because it was the last thing I tried fixing, that the home power going on and off messed with some part of the 8 year old power supply and was not giving clean power to the system. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.