Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Multiple Machine Check Events

Featured Replies

Hi there. 

Today i noticed a few [Hardware Error]: Machine check events  on my Unraid server. 

 

Quote

Oct  2 01:25:53 Tower kernel: mce: [Hardware Error]: Machine check events logged
Oct  2 01:25:53 Tower kernel: [Hardware Error]: Corrected error, no action required.
Oct  2 01:25:53 Tower kernel: [Hardware Error]: CPU:0 (17:71:0) MC18_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
Oct  2 01:25:53 Tower kernel: [Hardware Error]: Error Addr: 0x000000013f4f5bc0
Oct  2 01:25:53 Tower kernel: [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x36a408000a800d01
Oct  2 01:25:53 Tower kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
Oct  2 01:25:53 Tower kernel: EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x52d3d6 offset:0xdc0 grain:64 syndrome:0x800)
Oct  2 01:25:53 Tower kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

 

hey look almost the same, all of them. As i googled it some say this error could be CPU cache related because it has ECC. I recently upgraded to ECC DRAM as well so i ask myself maybe its the ram. The Server runs a few days now and i got this error mostly last night. 

Could someone please take a look at my Diagnostics? Im not that expert of this kind of error. 

 

Many Thanks 

tower-diagnostics-20241002-1107.zip

  • Community Expert

Could also be a RAM error, look in the SEL in the BIOS/IPMI, there may be more info there.

  • Author

Hmm, what is SEL? I use a consumer Board from Asus. May you can give me a hint in the right direction. I have also no clue if that Board has something to look into.

  • Community Expert

System Even Log, but if it's a consumer board, likely doesn't have one, you can try using the server with half the RAM installed, if the same try the other half, that will basically rule out a RAM issue, so if you still get the error, it's likely the CPU.

  • Author

Two things i have here in mind. First, i had this kind of error (not sure if its the same) since i have this Mobo CPU combo, but back in the days i had regular Memory, no ECC. I ignored the error and moved on. I didn't had the chance to check the logs back then, and yeah server runs over the month perfectly fine. 

 

Second thought: If its truly the ram, in my experience, should i have not a lot more Erros over the last two weeks? This error is really rar, and regardless how much ram i use with Dockers and VMs and so on, it runs perfectly fine. In my Experience with regular PC stuff, Ram errors should came up more frequently, right? 

 

Also, the server got the errors in the night, were he basically just runs to download steam game updates via a Docker. No special Tasks. 

 

Strange.

 

Is there may a Option to check the ram via Memtest? Can i trigger this error there too? To Dismantle the server is a bit tricky XD

  • Community Expert

There is the memtest available via the Unraid boot menu and the Live Memory Tester plugin that can be run on a live Unraid system.   Note that NOT finding an error is not definitive, whereas finding one is.

 

RAM issues can be unpredictable as to the symptoms, as it is highly dependent on where in RAM the error actually is and how frequently it occurs.   Symptoms can range from corruption of data being written to disks (which can easily be overlooked), or more severe issues such as server crashes.

  • Author

Yeah ram is really a pain sometimes. 

Another Question. If the Ram detects a error, and its a ECC Memory (multibit). He should repair it by itself right? Okay, so if that happnes, does the error looks like the one i got? Or ist that then more clear that the ram detects a error? I just want to figure out how i can get closer to the source of this error, because errors like this, like: Something with Memory is wrong" wont help much because on a unraid server, basically Everything is a kind of memory. Its not against you guys, more like how the systems create logs and stuff. Wish they could be more specific. 

 

I know that i got Hardware error in the past, with regular QVL doubblecheckt memory but never had the chance to get to the logfiles because its not a 24/7 server. At that time i ignored that error and moved on. And that was years ago. 

 

I have the Live Memory tester, but i tend to use "offline" methods like Memtest or something. 

 

It could be also a problem that i run 4 Slots of Memory. ECC Memory to find for this Plattform is a pain, were Asus directs you to the QVL List, Advertising it can handle ECC, but the list offers only regular non ECC kits. 

 

I think my next step so far is to let run Memtest over night and see what that brings. The Error occurs then may somewhere in the next two weeks. 

 

So for now, we cant determin if its  CPU Cache or RAM. 

 

 

Edited by Kasjo

  • Author

Just letting you now. Not Final Statement but so far:

 

I did a lot of research the last days and get to one thread that might helps. After a pass in memtest over the night i managed to find this thread:

Faq for Unrraid

 

Here is explains that my Ryzen model, 3900X wont support official higher speed as 2666 wit full loadout. So remembering myself, the old ram was also a 2666mhz... i changed the frequency and after the past two days, getting this error every 5 mins in the log right after the start, my server run for hours now without any error. 

I will check that the next days and weeks and will get back here to report the Result. But for now, it looks awesome so far. 

 

Again, many thanks for your help so far. 

Edited by Kasjo

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.