Include EDAC-UTIL


Recommended Posts

  • 2 months later...
  • 4 months later...
  • 2 years later...
  • 3 months later...
On 10/16/2019 at 12:54 PM, flaggart said:

I would also like this.  In its absence I have been referring to this post which suggests the same information is available without the tool.

 

Quote

# grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count
grep: /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count: No such file or directory

 

Not seeing anything about EDAC in dmsg too. Does Unraid still support ECC?

Link to comment
On 9/22/2022 at 8:30 PM, realies said:

Not seeing anything about EDAC in dmsg too. Does Unraid still support ECC?

EDAC should usually load automatically if you CPU/Motherboard/Memory Controller supports it.

 

However you can load it manually using:

modprobe amd64_edac

for AMD CPUs

 

or for Intel 10th, 11th, 12th Gen processors with:

modprobe igen6_edac

 

If you get an error like "modprobe: ERROR: could not insert 'MODULNAME': No such device" then it's most likely the wrong module since this means that the module doesn't find a compatible hardware device.

 

BTW you can get all available modules with this command:

ls -la /lib/modules/*-Unraid/kernel/drivers/edac/

 

The main issue with EDAC is that it is really noisy at times and also can give you a lot of false positives and can ultimately drive you crazy... :D

EDAC also reports PCIe errors from what I know and since not all PCIe devices follow the entire PCIe standard there can be many, many, many, maaaannnny issues at certain times and with certain hardware combinations.

Link to comment
  • 1 year later...

Here is an example from my Server.

 

Mainboard: Supermicro X10DRI-F

CPU: Intel Xeon E5-2630 v4

RAM: 4x 8GB ECC Memory

UNRAID: 6.12.6

 

# dmesg | grep EDAC
[  129.472264] EDAC MC: Ver: 3.0.0
[  129.482175] EDAC sbridge: Seeking for: PCI ID 8086:6fa0
[  129.482194] EDAC sbridge: Seeking for: PCI ID 8086:6fa0
[  129.482213] EDAC sbridge: Seeking for: PCI ID 8086:6f60
[  129.482226] EDAC sbridge: Seeking for: PCI ID 8086:6fa8
[  129.482234] EDAC sbridge: Seeking for: PCI ID 8086:6fa8
[  129.482244] EDAC sbridge: Seeking for: PCI ID 8086:6f71
[  129.482251] EDAC sbridge: Seeking for: PCI ID 8086:6f71
[  129.482261] EDAC sbridge: Seeking for: PCI ID 8086:6faa
[  129.482269] EDAC sbridge: Seeking for: PCI ID 8086:6faa
[  129.482279] EDAC sbridge: Seeking for: PCI ID 8086:6fab
[  129.482286] EDAC sbridge: Seeking for: PCI ID 8086:6fab
[  129.482292] EDAC sbridge: Seeking for: PCI ID 8086:6fac
[  129.482296] EDAC sbridge: Seeking for: PCI ID 8086:6fac
[  129.482302] EDAC sbridge: Seeking for: PCI ID 8086:6fad
[  129.482307] EDAC sbridge: Seeking for: PCI ID 8086:6fad
[  129.482312] EDAC sbridge: Seeking for: PCI ID 8086:6f68
[  129.482317] EDAC sbridge: Seeking for: PCI ID 8086:6f79
[  129.482325] EDAC sbridge: Seeking for: PCI ID 8086:6f6a
[  129.482333] EDAC sbridge: Seeking for: PCI ID 8086:6f6b
[  129.482341] EDAC sbridge: Seeking for: PCI ID 8086:6f6c
[  129.482365] EDAC sbridge: Seeking for: PCI ID 8086:6f6d
[  129.482374] EDAC sbridge: Seeking for: PCI ID 8086:6ffc
[  129.482377] EDAC sbridge: Seeking for: PCI ID 8086:6ffc
[  129.482384] EDAC sbridge: Seeking for: PCI ID 8086:6ffd
[  129.482387] EDAC sbridge: Seeking for: PCI ID 8086:6ffd
[  129.482394] EDAC sbridge: Seeking for: PCI ID 8086:6faf
[  129.482399] EDAC sbridge: Seeking for: PCI ID 8086:6faf
[  129.482617] EDAC MC0: Giving out device to module sb_edac controller Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0 (INTERRUPT)
[  129.482623] EDAC sbridge:  Ver: 1.1.2 



# lsmod | grep edac
sb_edac                24576  0
edac_core              65536  1 sb_edac


# grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count
/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch2_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch3_ce_count:0



# lshw -class memory | grep ecc
       capabilities: ecc    
       configuration: errordetection=multi-bit-ecc


# mcelog --client
Memory errors
SOCKET 0 CHANNEL 0 DIMM 0
DMI_NAME "P1-DIMMA1" DMI_LOCATION "P0_Node0_Channel0_Dimm0"
corrected memory errors:
        0 total
        0 in 24h
uncorrected memory errors:
        0 total
        0 in 24h

SOCKET 0 CHANNEL 1 DIMM 0
DMI_NAME "P1-DIMMB1" DMI_LOCATION "P0_Node0_Channel1_Dimm0"
corrected memory errors:
        0 total
        0 in 24h
uncorrected memory errors:
        0 total
        0 in 24h

SOCKET 0 CHANNEL 2 DIMM 0
DMI_NAME "P1-DIMMC1" DMI_LOCATION "P0_Node0_Channel2_Dimm0"
corrected memory errors:
        0 total
        0 in 24h
uncorrected memory errors:
        0 total
        0 in 24h

SOCKET 0 CHANNEL 3 DIMM 0
DMI_NAME "P1-DIMMD1" DMI_LOCATION "P0_Node0_Channel3_Dimm0"
corrected memory errors:
        0 total
        0 in 24h
uncorrected memory errors:
        0 total
        0 in 24h

 

One can get the Information without `edac-util`, but maybe it would be still nice to have.

 

EDAC Kernel Module does load automatically.

 

I don't know if UNRAID will trigger a warning via WebUI if ECC Errors are detected, but this may be OT.

 

+1

Edited by pixeldoc81
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.