Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Memory Errors

Featured Replies

Hi Guys,

 

I Am a new user on the UNRAID platform an since a couple of weeks a receive hardware errors. Can anyone help me how to troubleshoot. I have installed 256 GB DDR3 ECC RAM but receive some errors in my log. I have no idea what to do to solve this issue. 

 

 

Thanks

Tom

RAM error.JPG

Those are probably due to bad DIMM. You might want to remove one stick at a time and see which one causes the error to go away.

As testdasi said, it is likely a bad DIMM. I just had to deal with this myself and it isn't fun.

 

Best thing to do would to run memtest86 with all memory installed until you start seeing memory errors. Once you see these errors, take a picture / write down where they occur (eg: test 2, test 3, etc..)  and how many. That way you only have to run up to those tests and not have to go through the entire testing to find the bad stick.

 

Once you've found out which test you seem to have failures at, remove all but the minimum amount of memory required by your system and then run the test past the test # you were previously getting errors at. Then you just swap one stick in at a time, retest up to the test # after and repeat until you find the bad stick.

 

After you've eliminated the last stick, run through the full memory test gambit to ensure everything checks out.

Been down this road recently.

OS would catch faults in 2 of the 12 sticks in the server, but memtest86 didn't.

Got replacement sticks, and no errors since.

also looks like you have a pair of 'em bad.

  • Author

Hi Guys,

 

Thanks for the reply's. I already run memtest for 50% but it takes to much time. Is it maybe a idea to remove 2 random dims end test till I do not have any errors left?

 

What I also do not understand is how thoses dimms kan be broken. ECC dimms are error corrected right?

  • Community Expert
10 minutes ago, TJOPTJOP said:

ECC dimms are error corrected right?

ECC dimms can still malfunction, but unlike with non ECC RAM it won't corrupt your data when that happens, board's system event log might have more info on which dimms are the problem, if not remove one by one until errors stop.

 

Also no point in running memtest unless ECC can be disable in the BIOS.

 

 

  • Author
8 hours ago, johnnie.black said:

ECC dimms can still malfunction, but unlike with non ECC RAM it won't corrupt your data when that happens, board's system event log might have more info on which dimms are the problem, if not remove one by one until errors stop.

 

Also no point in running memtest unless ECC can be disable in the BIOS.

Hi, thanks for the support! I have installed 16 slots of 16 GB which make 256 GB in total. I removed from each second slot the ram which gives my server a total of 128 GB but after booting no errors in the log of UNRAID. I turn the server off and swap the ram with the other 128 GB and boot it up, again no errors.

 

So, I decide to install all the ram modules again, turn off ECC checking in my bios and run a full Memtest. Any suggestions how many pases I need to confirm if my ram is good or bad? I think that one pass will take 12+ hours.

8 hours ago, johnnie.black said:

 

 

 

try running this:

 

grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count

 

and posting the results, AFTER you confirm you're still having errors.

 

It'll basically list exactly which memory chips are bad, in the order they're installed on the board, per physical processor.

 

  • Author
On 12/14/2019 at 4:55 AM, sota said:

try running this:

 

grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count

 

and posting the results, AFTER you confirm you're still having errors.

 

It'll basically list exactly which memory chips are bad, in the order they're installed on the board, per physical processor.

 

So, I run memtest on my supermicro server for about 83 hours and finally 1 pass complete, 0  errors. Also after clean reboot no upcoming errors in my unraid log. I have absolutely no idea why I had in the beginning a lot of errors showing in my log and right know everything is working fine!

Just to be clear, are you now running with ECC off? That seems like a bad idea.

Doesn't that mean that instead of seeing the errors, it's just going to be failing silently and potentially corrupting the memory? (especially if you weren't seeing the errors in memtest in the first place)

 

Quote

So, I decide to install all the ram modules again, turn off ECC checking in my bios and run a full Memtest. Any suggestions how many pases I need to confirm if my ram is good or bad? I think that one pass will take 12+ hours.

 

On 12/13/2019 at 5:07 AM, TJOPTJOP said:

 

What I also do not understand is how thoses dimms kan be broken. ECC dimms are error corrected right?


Also, this is exactly what the log is telling you.

A "CE memory read error" or "CE memory scrubbing error" is a "Correctable Error (CE)". It would be worse if you were getting "Uncorrectable Errors (UE)"

My concern is that you've merely turned off the memory scrubbing and whatnot, hiding the errors rather than fixing them.

  • Author

Today I receive again CE errors. So as ask I entered the following command in the command line. See pic1 for the results.

grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count

 

pic1.JPG

ram error again.JPG

  • Author

Today again error but I think I found the bad module. Just removed that bad one and test the machine for upcoming two days. Hopefully that will solve my UNRAID memory problems.

  • 1 month later...
  • Author

I change all my ram for new ram and it looks like my problems are resolved.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.