Jryski Posted December 7, 2023 Share Posted December 7, 2023 This server has been stable for 2-3 years, no changes to hardware or configuration since this began. The server goes unresponsive from all remote connectivity. I'm not sure about local, I'm waiting for another failure top verify if anything is accessible locally, I was running headless. Dockers stop responding, UI, SSH, all completely dead. A restart brings it back up. I'm including the Diagnostic from just after the last 2 failures and recovery. I don't see anything in the logs, but I'm hoping there's a clue in the files. ASRock X570 Taichi ryzen 9 5950x 128 GiB DDR4 GSKILL Memory 6.12.4 unimatrixzero-diagnostics-20231207-1722.zip unimatrixzero-diagnostics-20231205-1604.zip Quote Link to comment
Scheev Posted December 7, 2023 Share Posted December 7, 2023 What is your current USB? I was having really similar issues where my syslog was showing corrupted data. Same behavior as you're describing, after a few days things would just gradually stop working, requiring a reboot. Quote Link to comment
JorgeB Posted December 8, 2023 Share Posted December 8, 2023 Enable the syslog server and post that after a crash. Quote Link to comment
Jryski Posted December 8, 2023 Author Share Posted December 8, 2023 While scrubbing the Syslog for personal details, I found this. Quote [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:1 (19:21:0) MC21_STATUS[-|CE|MiscV|-|PCC|-|CECC|-|-|-]: 0x8b48c03108508948 [Hardware Error]: IPID: 0x0000000000000000 [Hardware Error]: Bank 21 is reserved. [Hardware Error]: cache level: RESV, tx: GEN I'm exploring the potential that this is the cause, and will be changing "typical idle current" as I've read this can cause random stability issues. If this is the case I will post for posterity. Quote Link to comment
Jryski Posted December 8, 2023 Author Share Posted December 8, 2023 15 hours ago, Scheev said: What is your current USB? I was having really similar issues where my syslog was showing corrupted data. Same behavior as you're describing, after a few days things would just gradually stop working, requiring a reboot. SAMSUNG MUF-32AB/AM FIT Plus 32GB Quote Link to comment
Scheev Posted December 8, 2023 Share Posted December 8, 2023 1 hour ago, Jryski said: SAMSUNG MUF-32AB/AM FIT Plus 32GB To my understanding and research, USB 3+ keys are not ideal for use with Unraid. I would recommend moving to a USB 2.0 key. Quote Link to comment
itimpi Posted December 8, 2023 Share Posted December 8, 2023 17 minutes ago, Scheev said: To my understanding and research, USB 3+ keys are not ideal for use with Unraid. I would recommend moving to a USB 2.0 key. This is not always that easy any more as USB2 drives are getting harder to find. If a USB3.x drive needs to be used then it is definitely worth seeing if an USB2 port on the server can be used. Even if they do not have an external USB2 port I think most motherboards still tend to have a USB2 header on the motherboard that can be used with the appropriate adapter/cable. Quote Link to comment
Jryski Posted December 21, 2023 Author Share Posted December 21, 2023 Well, after running memtest and getting passes, rebuilding the docker image, checking the SMART on all drives, changing all BIOS settings to recommended setting, removing the XMP profile on the RAM, it's still freezing. I've checked all the logs and all I see is mover running, mover completes and nothing else is written to the logs, the server simply stops responding. I've had this happen while under load and while idle. The flash drive is only 3 months old, I had it fail on me a few months ago. I'm at a loss here. Quote Link to comment
JorgeB Posted December 22, 2023 Share Posted December 22, 2023 One thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
Jryski Posted December 22, 2023 Author Share Posted December 22, 2023 I finally feel like I've found something in the logs, attaching for anyone who might understand what I'm seeing. Syslog Errors.txt Quote Link to comment
JorgeB Posted December 22, 2023 Share Posted December 22, 2023 There are a lot of call traces, can't see what's causing them though, but they look more hardware related to me, you can still try what I mentioned above. Quote Link to comment
Jryski Posted December 22, 2023 Author Share Posted December 22, 2023 49 minutes ago, JorgeB said: There are a lot of call traces, can't see what's causing them though, but they look more hardware related to me, you can still try what I mentioned above. Thanks, I will try that and report back. Quote Link to comment
Solution Jryski Posted December 26, 2023 Author Solution Share Posted December 26, 2023 I made the move to IPVLAN i also made these changes... Host access to custom networks: Enabled Preserve user defined networks: Yes All issues have ceased. This is not an ideal solution, but it's working for me for now. I'm now at 48 hours stable and error free for the first time in months. I'm not sure why they call this issue resolved in this version, as it's clearly still an issue, but I'm stable for now. Thanks for the help along the way. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.