Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Help understanding ECC errors in log

Featured Replies

I've just migrated my Unraid setup to new hardware and haven't checked the logs in a few days. I noticed there's L3 cache and ECC errors.
 

Spoiler

 

Nov 25 10:35:57 Apollo kernel: traps: lsof[30540] general protection fault ip:149830c9cc6e sp:47ac5b58ccaad0d1 error:0 in libc-2.37.so[149830c84000+169000]
Nov 26 11:28:22 Apollo kernel: mce: [Hardware Error]: Machine check events logged
Nov 26 11:28:22 Apollo kernel: [Hardware Error]: Corrected error, no action required.
Nov 26 11:28:22 Apollo kernel: [Hardware Error]: CPU:1 (17:31:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
Nov 26 11:28:22 Apollo kernel: [Hardware Error]: Error Addr: 0x00000001d0bc61c0
Nov 26 11:28:22 Apollo kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x196800020a800503
Nov 26 11:28:22 Apollo kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
Nov 26 11:28:22 Apollo kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Nov 26 14:26:57 Apollo usbhid-ups[22983]: [D5:22983] send_to_all: ADDCMD driver.reload-or-error
Nov 26 14:26:58 Apollo usbhid-ups[22983]: [D2:22983] send_to_one: sending ADDCMD driver.reload-or-error
Nov 26 14:26:58 Apollo usbhid-ups[22983]: [D5:22983] send_to_one: ADDCMD driver.reload-or-error
Nov 26 14:26:58 Apollo usbhid-ups[22983]: [D6:22983] send_to_one: write 30 bytes to socket 16 succeeded (ret=30): ADDCMD driver.reload-or-error
Nov 26 15:26:32 Apollo kernel: apcupsd[7843]: segfault at 0 ip 000000000041acf3 sp 00007ffd830b9ac0 error 4 in apcupsd[404000+2f000] likely on CPU 31 (core 29, socket 0)
Nov 26 15:53:59 Apollo root: NUTSTATS: Error Updating ups.realpower.json - UPS returned BAD or NULL value.
Nov 26 15:53:59 Apollo root: NUTSTATS: Error Updating input.voltage.json - UPS returned BAD or NULL value.
Nov 26 15:53:59 Apollo root: NUTSTATS: Error Updating input.frequency.json - UPS returned BAD or NULL value.
Nov 26 15:53:59 Apollo root: NUTSTATS: Error Updating output.voltage.json - UPS returned BAD or NULL value.
Nov 26 15:53:59 Apollo root: NUTSTATS: Error Updating output.frequency.json - UPS returned BAD or NULL value.
Nov 26 21:45:30 Apollo kernel: mce: [Hardware Error]: Machine check events logged
Nov 26 21:45:30 Apollo kernel: [Hardware Error]: Corrected error, no action required.
Nov 26 21:45:30 Apollo kernel: [Hardware Error]: CPU:1 (17:31:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
Nov 26 21:45:30 Apollo kernel: [Hardware Error]: Error Addr: 0x00000001d0bc4240
Nov 26 21:45:30 Apollo kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x196800020a800503
Nov 26 21:45:30 Apollo kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
Nov 26 21:45:30 Apollo kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Nov 27 09:13:37 Apollo kernel: mce: [Hardware Error]: Machine check events logged
Nov 27 09:13:37 Apollo kernel: [Hardware Error]: Corrected error, no action required.
Nov 27 09:13:37 Apollo kernel: [Hardware Error]: CPU:1 (17:31:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
Nov 27 09:13:37 Apollo kernel: [Hardware Error]: Error Addr: 0x00000001d0bc61c0
Nov 27 09:13:37 Apollo kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x196800020a800503
Nov 27 09:13:37 Apollo kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
Nov 27 09:13:38 Apollo kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Nov 27 20:30:50 Apollo kernel: mce: [Hardware Error]: Machine check events logged
Nov 27 20:30:50 Apollo kernel: [Hardware Error]: Corrected error, no action required.
Nov 27 20:30:50 Apollo kernel: [Hardware Error]: CPU:1 (17:31:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
Nov 27 20:30:50 Apollo kernel: [Hardware Error]: Error Addr: 0x00000001d0bc6b00
Nov 27 20:30:50 Apollo kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x32d000040a800503
Nov 27 20:30:50 Apollo kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
Nov 27 20:30:50 Apollo kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Nov 28 14:21:15 Apollo kernel: mce: [Hardware Error]: Machine check events logged
Nov 28 14:21:15 Apollo kernel: [Hardware Error]: Corrected error, no action required.
Nov 28 14:21:15 Apollo kernel: [Hardware Error]: CPU:1 (17:31:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
Nov 28 14:21:15 Apollo kernel: [Hardware Error]: Error Addr: 0x00000001d0bc4240
Nov 28 14:21:15 Apollo kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x196800020a800503
Nov 28 14:21:15 Apollo kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
Nov 28 14:21:15 Apollo kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Nov 28 21:10:51 Apollo kernel: mce: [Hardware Error]: Machine check events logged
Nov 28 21:10:51 Apollo kernel: [Hardware Error]: Corrected error, no action required.
Nov 28 21:10:51 Apollo kernel: [Hardware Error]: CPU:1 (17:31:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
Nov 28 21:10:51 Apollo kernel: [Hardware Error]: Error Addr: 0x00000001d0bc4440
Nov 28 21:10:51 Apollo kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x32d000040a800503
Nov 28 21:10:51 Apollo kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
Nov 28 21:10:51 Apollo kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

 

The Unraid logs also seem to sync up with some logs from the Gigabyte management panel. It's logged this with in a minute or two of each of the errors:
 

Quote

Unknown sensor of type memory logged a smi_handler : Correctable ECC / other correctable memory error(DIMM_D1) was asserted

 

Is all of this suggesting the RAM in DIMM_D1 is bad or is there something else going on? I realise EPYC is a bit funny about mounting pressure, so I was going to try reseating the CPU and the RAM. The CPU was preinstalled on the mobo when I got it, but I have an appropriate torque screwdriver now.

 

I've attached diagnostics just in-case.

 

Any help would be greatly appreciated.

apollo-diagnostics-20231129-0847.zip

Edited by Benji

I would try swapping DIMM_D1 with another one and see if the errors follow the stick.

  • Author

Good idea, I'll try that now.

  • Author
On 11/29/2023 at 12:00 PM, JorgeB said:

I would try swapping DIMM_D1 with another one and see if the errors follow the stick.

 

I swapped it to C1 and I'm getting the same errors in Unraid but the motherboard panel has changed the error to C1. So is it safe to say it's a bad stick then?

8 minutes ago, Benji said:

So is it safe to say it's a bad stick then?

I would think so.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.