JorgeB Posted March 3, 2022 Share Posted March 3, 2022 5 minutes ago, nexusjosh said: Though after 10 minutes, the screen goes black? I'm unsure if that is Unraid It's Unraid. Quote Link to comment
nexusjosh Posted March 3, 2022 Author Share Posted March 3, 2022 6 hours ago, JorgeB said: It's Unraid. Is there any way to disable that? At least temporarily. Quote Link to comment
JorgeB Posted March 3, 2022 Share Posted March 3, 2022 You can add setterm -blank 0 to /boot/config/go Quote Link to comment
nexusjosh Posted March 3, 2022 Author Share Posted March 3, 2022 3 hours ago, JorgeB said: You can add setterm -blank 0 to /boot/config/go Thanks added. Update. I disabled bifurcation, and it still crashed. I removed the ASUS Hyper, and replaced it with the original single m.2 card. Perhaps the Hyper is causing some sort of issue? We shall see. Sigh. If this DOES fix the issue. I'm going to email SuperO, and see if they have any ideas as to why it doesn't properly work, when one would think it should. Quote Link to comment
nexusjosh Posted March 4, 2022 Author Share Posted March 4, 2022 (edited) @JorgeB So I just power cycled my server with the option you gave me. And Its hanging during boot. with the attached USB error. I'm going to power cycle the server again, but... Could this entire problem be the bootable USB is failing? @.@ Its a supposedly good Samsung Fit 32GB small form factor flash drive. I'm going to reboot the server again and see if it comes up. Though I might as well run memtest for giggles. Edited March 4, 2022 by nexusjosh Quote Link to comment
nexusjosh Posted March 4, 2022 Author Share Posted March 4, 2022 (edited) Yep, just got another reset. With the same USB error upon attempting to boot again. I'm going to run scandisk and such on the current USB. In the mean time, anyone know what the read/64. error -71 is? Edited March 4, 2022 by nexusjosh Quote Link to comment
nexusjosh Posted March 4, 2022 Author Share Posted March 4, 2022 So I scanned the USB, and found 0 errors. So I rebuilt it. We shall see if it stabilizes. Its worth noting, my server is 5 years old, so I could see some sort of corruption on it being the cause. Thoughts? Quote Link to comment
JorgeB Posted March 4, 2022 Share Posted March 4, 2022 Unlikely that USB related issues would power off the server. Quote Link to comment
nexusjosh Posted March 4, 2022 Author Share Posted March 4, 2022 10 minutes ago, JorgeB said: Unlikely that USB related issues would power off the server. Well, I hope that it is! I've had my server for about 5 years, and haven't had any major issues until now. I've now restored the hardware back to the state it was when it was last stable. I'm unsure what else I can do, other than hope it WAS the USB. If it isn't... Maybe the Motherboard is going bad, or the CPU. Sigh. Its been an hour and it hasn't crashed. Lately its been crashing just about every hour. If less. I'm screen recording the IPMI again, so we'll see what happens. Quote Link to comment
nexusjosh Posted March 5, 2022 Author Share Posted March 5, 2022 So I'm currently running Memtest, but its going to take a couple of days given my large ammount of Ram. I'm at a loss as to what it could be at this point, unless its, as I've said in a prior post, a CPU, or Motherboard issue, which would be a big shame. I'd be having to build a new server, which wouldn't be the End of the world, but meh. Sigh. Does anyone know if there are any stress test apps by chance on Community Apps, perhaps one to stress individual components, to MAYBE determine the issue. I once had access to Ultra-X when I used to be a Service Manager at Frys Electronics, and their kit was unique, I've looked on and off, and haven't found an alternative for it, which is a big shame. Quote Link to comment
JorgeB Posted March 5, 2022 Share Posted March 5, 2022 Fist thing I would try would be a different PSU. Quote Link to comment
nexusjosh Posted March 5, 2022 Author Share Posted March 5, 2022 (edited) 4 minutes ago, JorgeB said: Fist thing I would try would be a different PSU. A fair idea, though I JUST replaced the PSU about 6 months ago with the overkill EVGA Supernova 1000 T2, 80+ Titanium. I'll plug it into my PSU tester once the memtest is complete... Perhaps the power cabled need re-seating. Ugh. I reseated the PCIE cards about 50 times, but didn't think to re-seat the power cables. Edited March 5, 2022 by nexusjosh Quote Link to comment
nexusjosh Posted March 6, 2022 Author Share Posted March 6, 2022 I let the RamTest get to step 2, then halted it. No errors found. Plugged the PSU into a PSU tester, and it tested out fine. I've re-reated the power cables, so... we'll see if it power cycles. Quote Link to comment
Squid Posted March 6, 2022 Share Posted March 6, 2022 4 minutes ago, nexusjosh said: RamTest Were you running the Memtest from Unraid's boot menu? If so, it will not find any ECC issues (and most servers with ECC will completely halt the system if it runs into an uncorrectable ECC error). Setup a new boot stick and download from https://www.memtest86.com/ On 3/4/2022 at 3:11 AM, nexusjosh said: USB These problems won't affect the "stability" of the server. However based on your screenshots your flash is dropping offline and reconnecting. Try a different port. If it continues, then replace the flash drive and transfer your registration Quote Link to comment
nexusjosh Posted March 7, 2022 Author Share Posted March 7, 2022 (edited) 4 hours ago, Squid said: Were you running the Memtest from Unraid's boot menu? If so, it will not find any ECC issues (and most servers with ECC will completely halt the system if it runs into an uncorrectable ECC error). Setup a new boot stick and download from https://www.memtest86.com/ These problems won't affect the "stability" of the server. However based on your screenshots your flash is dropping offline and reconnecting. Try a different port. If it continues, then replace the flash drive and transfer your registration Thanks for your feedback! I did use a Memtest USB I had, not the one Unraid is coupled with. The one coupled with Unraid actually wouldn't boot for whatever reason. I tested the Flash Drive, and it all tested out fine. I re-connected/seated the power cables after testing the PSU, and the server has been purring along for nearly 5 hours now. Soo... Maybe the power needed to be re-seated?! X.x Ugh. I don't want to yet get my hopes up, but, maybe that was the fix. Even though the issues started after I added in the new ASUS card. Maybe the ASUS card will now function fine. Edited March 7, 2022 by nexusjosh Quote Link to comment
nexusjosh Posted March 7, 2022 Author Share Posted March 7, 2022 So, I was hoping the issue was resolved. It was not. After about 8 hours of up time it randomly power cycled once more, and I still have no idea why I did catch it on a recording this time. No errors or anything, just... Boop. Hard Power cycle. In the morning, I suppose I'll perhaps re-apply thermal paste? Any other ideas on possible solutions? I might change out the PSU, but I don't have a spare one on hand. Crash.mp4 Quote Link to comment
nexusjosh Posted March 7, 2022 Author Share Posted March 7, 2022 I just had the inspiration on... Perhaps it could be the UPS? @.@ I'm going to swap out the Cyber Power UPS I have connected to it with a new APC UPS I got a couple months ago... See if that makes a difference. Quote Link to comment
ChatNoir Posted March 7, 2022 Share Posted March 7, 2022 1 hour ago, nexusjosh said: I just had the inspiration on... Perhaps it could be the UPS? @.@ Or if you rarely have any power grid issues, try without the UPS for a few days. Quote Link to comment
nexusjosh Posted March 7, 2022 Author Share Posted March 7, 2022 Just now, ChatNoir said: Or if you rarely have any power grid issues, try without the UPS for a few days. Sadly I do have power issues. Though if it powers off once again on this new UPS, I'll try that, and hope that a brown-out, or outage, doesn't murder my server. I'll just move the plug from the battery side, to the surge protection side of the UPS lol. Quote Link to comment
nexusjosh Posted March 8, 2022 Author Share Posted March 8, 2022 Sooo, plugged it into a new UPS, aaand same issue. Power cycles after a while. Going to plug it stright into the wall, just as a "What the hell, lets give it a try." And re-apply thermal paste Anyone else have any other ideas? Quote Link to comment
-MacGyver- Posted March 8, 2022 Share Posted March 8, 2022 Maybe setup another USB key with a different distro, like Slackware (since that is what unraid is based on) and see what it does for a few hours/days. Use a live install versus an Installer install to the USB drive. If it works then I'm not sure what that means. Could be tied to one of the kernel parameters that Lime it using to compile this kernel with that is incompatible with your hardware. Getting ahead of things though with this thinking... Quote Link to comment
nexusjosh Posted March 8, 2022 Author Share Posted March 8, 2022 6 hours ago, klepel said: Maybe setup another USB key with a different distro, like Slackware (since that is what unraid is based on) and see what it does for a few hours/days. Use a live install versus an Installer install to the USB drive. If it works then I'm not sure what that means. Could be tied to one of the kernel parameters that Lime it using to compile this kernel with that is incompatible with your hardware. Getting ahead of things though with this thinking... So, when I was running in safe mode a few days ago, it wan fine for 2 days with no crashes. But then begins to crash when its not in safe mode. So you may be on to something there. Its just so disheartening that there are no error messages or -anything- just BAM, Hard Reset, with zero clues as to the issue whatsoever. I've opened a ticket with SuperO, hoping they will be able to help. Quote Link to comment
nexusjosh Posted March 8, 2022 Author Share Posted March 8, 2022 (edited) SuperO, doesn't know. But the more I think about it, the more I'm sure there is something going on with UnRaid, because when I have booted it into safe mode, it ran fine, with no issues!!! I'm going to try removing all of my plugins, unmounting my VM, and deleting the docker container I have. Update: I've removed all of the plugins, we shall see if it hard resets again in an hour or so. Edited March 8, 2022 by nexusjosh Quote Link to comment
nexusjosh Posted March 9, 2022 Author Share Posted March 9, 2022 Update: 7 Hours with no issues. It must have been one of the many plugins I had installed. Le SIGH! After a day or two, I'll begin reinstalling plugins one at a time, and if it crashes, I'll know which plugin is causing the instability. Quote Link to comment
hamish_18 Posted March 9, 2022 Share Posted March 9, 2022 Thanks for the troubleshooting @nexusjosh. Definitely would be interested if you found the conflicting plugin. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.