Various Unraid issues after replacing CPU + motherboard


Go to solution Solved by BRiT,

Recommended Posts

I recently replaced my Unraid server's CPU and motherboard and since this swap, I've had many issues with system stability. I think some of my drives are failing, but I can't tell which ones as the SMART reports all say that the drives are fine. I've currently running extended SMART on each drive but this may take a few hours to days. I've also had data loss in some of my applications, such as Sonarr and Radarr, but others seem unaffected.

 

Additionally, I've tried connecting to the server via SFTP and when I try to download a file it says "unable to download X bytes, retrying..." I've also gotten "Bad Parameter" and "Server Error" when trying to start/remove Docker containers. Finally, when I tried to remove old docker containers through the CLI, I've gotten an error that essentially said "read-only filesystem". 

 

I've attached my diagnostics zip file. Let me know if any additional information is needed. Any support is greatly appreciated as I've been trying to solve these problems for a few weeks now!

nrgserver-diagnostics-20221222-1442.zip

Link to comment
1 hour ago, BRiT said:

Since you adjusted your mb/cpu/ram, have you validated your system is stable enough with a few cycles of MemTest?

How exactly do I run a memtest on unraid? The RAM currently installed is brand new. At first the system wouldn't boot but once I reseated each stick it seemed to work normally.

 

I've also attached the most recent diagnostics zip file with the completed extended SMART test results. If anyone could point me toward which disk seems like a failure risk, that would be super helpful! One of the disks is brand new and I have another new one on the way to replace a potentially failing one.

 

UPDATE: After restarting the system, my 3rd disk says this after formatting the entire disk (which took 6+ hours):

Unmountable: Unsupported or no file system

nrgserver-diagnostics-20221222-1819.zip

Edited by nrgbistro
Add more information
Link to comment
2 hours ago, nrgbistro said:

How exactly do I run a memtest on unraid? The RAM currently installed is brand new. At first the system wouldn't boot but once I reseated each stick it seemed to work normally

 

It's an option from the Boot Menu before it starts loading into unRaid.

 

Incorrect bios settings can cause issues even if the physical memory is perfectly fine. You need to make sure the settings are correct and none of the XMPP or Memory Overclocking options are enabled.

Link to comment

I ended up creating a memtest USB drive and running it on my system. My BIOS settings were incorrect for my RAM (voltage was too low) and I thought resetting the BIOS would fix these errors but after repeating the memtest on all sticks I immediately saw more errors (2000+ in 30 seconds).

 

Now, I'm running the memtest on each stick individually. So far I've had the first stick throw errors, the second stick had 0 errors, and the third is ongoing but so far so good. Does this mean I can rule out CPU, L1, and L2 cache errors? And inversely, can I conclude that the first stick is what was causing errors, assuming the other 3 end up with none?

 

I plan on returning the faulty stick to amazon and getting a new one. Once I've installed it is there anything else you would recommend I check regarding system stability? Thank you for the suggestions so far, BTW! I never would have thought to check my memory...

 

UPDATE: The rest of the sticks produced 0 errors individually as well as together. Using Unraid now seems much more stable and I haven't encountered anything suspicious so far. Will report back with problems if I find any in the near future.

Edited by nrgbistro
Link to comment

Still having issues with the server, such as docker containers randomly stopping, server execution errors, and reverse proxy only partially working. I'm going to return this RAM and get a different brand and also update my motherboard BIOS. If that doesn't fix the problem I plan to attempt to recover anything important on the server as I currently only have a backup of my docker appdata folder and do a complete reset.

Link to comment
  • 5 months later...
9 minutes ago, sage2050 said:

I memtested mine and didn't get any errors

How long did you run the memtest for? I originally thought a few hours would be enough and didn't find anything, but eventually I left it for abt 24 hours and found many errors. It is possible that the RAM is somehow causing issues in some other way. Did you try to configure/reset your BIOS settings regarding RAM?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.