14935 Posted September 26, 2023 Share Posted September 26, 2023 Hello Gurus, I recently swapped my motherboard, in what had been a very stable system (trying to get a few more SATA ports). After the swap, my server becomes unresponsive after a couple of days of uptime. I lose contact with the web interface, can't ping, or enter commands locally. The first time I hoped it was a fluke. The second time, I exchanged the MB. It has happened twice since then. I suspect that an unclean shutdown took one of my parity drives offline. I have not been stable long enough to get it back in the array. I have an almost identical system using the same MB (both with latest BIOS) that has not caused any problems. System includes: MSI Pro B550-VC/BIOS I.40 July 3 2023 16GB G.Skill Ripjaws DDR4 3200 Dell HBA H310 Seasonic Focus PX-750 Ryzen 5 3400G I have run Memtest without issues, swapped memory, run an extended SMART test on my ailing parity drive (no problems reported), and checked that XMP is disabled for memory in the BIOS. I mirrored my syslog to the flash drive but nothing is jumping out at me. Do you have any advice before I revert to the old MB? unraid2-diagnostics-20230926-0901.zip unraid2-smart-20230926-0901.zip syslog Quote Link to comment
14935 Posted September 26, 2023 Author Share Posted September 26, 2023 forgot to mention, I am not running VMs or Dockers, just using this as a simple NAS. Thanks Quote Link to comment
14935 Posted September 26, 2023 Author Share Posted September 26, 2023 Thanks trurl, I looked for a place to disable c-states once before and didn't find it. I will check with MSI support and see if they can point me in the right direction. Quote Link to comment
14935 Posted September 26, 2023 Author Share Posted September 26, 2023 Global C-States are now disabled. MSI support was very helpful. On my MB the setting is here: Advanced>OverClocking>Advanced CPU Configuration>AMD CBS>Global C-State Control I am concerned that if this isn't the fix and I have another unclean shutdown, I might lose another disk, so I am not inclined to try and fix my invalid parity disk right away. My array is currently stopped. Would it be best just to let things sit like this for a few days, start the array in maintenance mode, or something else? Quote Link to comment
14935 Posted September 29, 2023 Author Share Posted September 29, 2023 Disabling C-States did not fix things. I just tried changing Power Supply Idle Control from from Auto to Typical Current Idle, and I am still getting lockups. I don't see anything in the syslog after I booted the server last night. I started the array and it began a parity check, ran overnight and just locked up this morning. syslog Quote Link to comment
JorgeB Posted September 29, 2023 Share Posted September 29, 2023 Unfortunately there's nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
14935 Posted September 29, 2023 Author Share Posted September 29, 2023 Thanks JorgeB. I am not running any Dockers or VMs, just using it as a file store. I have an almost identical system (same MB, Ryzen 4600G instead of 3400G in this one) that has been running fine. Do you think it might be worth swapping the CPU? That is the only difference I can think of between the 2 systems. Quote Link to comment
JorgeB Posted September 29, 2023 Share Posted September 29, 2023 You can try, since the problem apparently started after swapping the board could just be a board issue, or it doesn't like the Linux kernel. Quote Link to comment
14935 Posted September 29, 2023 Author Share Posted September 29, 2023 My first guess was a bad MB, but I exchanged the first one after the second lockup, so I'm on MB2 (same model, working fine in other server). I will pop in a different CPU before I revert to the old MB. Thanks. Quote Link to comment
14935 Posted October 1, 2023 Author Share Posted October 1, 2023 I swapped the 3400G for a 5600G and was able to complete a parity check. If things don't lock up by next weekend, I'll call it fixed. Thanks trurl and JorgeB. 1 Quote Link to comment
Solution 14935 Posted October 5, 2023 Author Solution Share Posted October 5, 2023 I have been able to transfer 15TB of data without a lock-up. Swapping the CPU seems to have been the solution. Thanks again for the help! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.