SmallwoodDR82 Posted August 12, 2023 Share Posted August 12, 2023 I have been running this exact USB key on an older Supermicro board for years with zero issues. Very stable. I since moved all drives and key over to a newer system. (I did run the new system on a trial of unRAID for 30 days, which was stable). After moving everything over and adding 2 NVME drives for a VM pool, I'm now having a stability issue. Over the last 3 days I can't get the server to stay up for 24 straight hours. But during that time it's solid and I see no errors in the logs. I might pull the NVMEs out for testing but it's odd, as I see no real errors and the VMs seem to run just fine. When it locks up I lose GUI and VGA. IPMI still works so I know the board isn't "down". I can reset via IPMI and then I'm back up and running for another 24ish hours. Any thoughts? Thank you in advance! smc-unraid-diagnostics-20230811-2355.zip Quote Link to comment
JorgeB Posted August 12, 2023 Share Posted August 12, 2023 Enable the syslog server and post that after a crash. Quote Link to comment
SmallwoodDR82 Posted August 12, 2023 Author Share Posted August 12, 2023 6 hours ago, JorgeB said: Enable the syslog server and post that after a crash. I had mirror to flash enabled is that different from local syslog server? That mirrored flash should be in the attachment. Quote Link to comment
itimpi Posted August 12, 2023 Share Posted August 12, 2023 1 minute ago, SmallwoodDR82 said: That mirrored flash should be in the attachment Not unless you added it! The standard diagnostics only include the RAM copy of the syslog, not the one mirrored to flash. Quote Link to comment
SmallwoodDR82 Posted August 12, 2023 Author Share Posted August 12, 2023 1 minute ago, itimpi said: Not unless you added it! The standard diagnostics only include the RAM copy of the syslog, not the one mirrored to flash. my fault everyone. I thought it was added to the diags zip. See attached mirrored syslog. I was changing some switches around on Aug 10 so those link down can be ignored. Crash was around Aug 11 23:15 I believe. syslog Quote Link to comment
JorgeB Posted August 13, 2023 Share Posted August 13, 2023 There's nothing relevant logged, suggesting a hardware issue, since it was after a move I would start by checking power supply cables are all correctly plugged and latched to the board. Quote Link to comment
SmallwoodDR82 Posted August 14, 2023 Author Share Posted August 14, 2023 (edited) It’s a dual power supply server (Supermicro CSE-836) Updated: I’ve looked in IPMI and the syslog has zero errors logged and while unraid crashes IPMI stays up. It crashed again today and the NVME drives were removed and had VMs off because of that. So it’s not NVME or VM related. I’m currently running a mem test and I’m half way through it with no errors. Edited August 14, 2023 by SmallwoodDR82 Quote Link to comment
SmallwoodDR82 Posted August 16, 2023 Author Share Posted August 16, 2023 update. Had another crash on Monday. Ran memtest. All Passed. Then I was digging around in forums and came across this post. I moved all my containers to a separate NIC via this Guide and since then, it's been stable. 48 hours so far! Fingers crossed. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.