Jump to content

[6.11.0] Server keeps crashing after a few hours, faulty hardware?


lamebot
Go to solution Solved by Bcy,

Recommended Posts

Hello everyone :)

 

I've been using Unraid for some years now to run a personal media server and had a very good experience overall.

 

Hardware information

  • Supermicro X9DR3-F
  • 2x Intel® Xeon® CPU E5-2660 v2 @ 2.20GHz
  • Temporarily 12 GiB DDR3 Multi-bit ECC, usually more
  • Array with 8 Disks ranging from 3 TB to 10 TB + SSD Cache

 

The problem

Since upgrading from 6.9 to 6.10 and 6.11, I'm having quite some troubles. It got worse with every update and right now, I'm at a point where the server is crashing every few hours or even sooner. Sometimes, I'm even able to trigger a crash by starting a download with SABnzbd. The download destination is a no cache share.

When it crashes, there is no responsiveness at all (I can't even turn off NumLock on a keyboard, let alone type anything).

 

So I set up a Syslog Server to save the log. In there, I found some mce errors, f.e.:

Quote

Oct 18 19:28:48 Fortress kernel: mce: [Hardware Error]: Machine check events logged

 

Attached you can find the syslog starting today at midnight (I changed some of the share names for privacy reasons).syslog-lamebot-181022.log

 

mcelog after the latest crash

Quote

mcelog: failed to prefill DIMM database from DMI data
Kernel does not support page offline interface
mcelog: Cannot read sysfs field /sys/kernel/security/lockdown: No such file or directory
Kernel in lockdown. Cannot enable DIMM error location reportingHardware event. This is not a software error.
MCE 0
CPU 10 BANK 7 TSC 4fe6753a1c
MISC 2042342a86 ADDR 385aca800
TIME 1666113897 Tue Oct 18 19:24:57 2022
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR
Transaction: Memory read error
STATUS 8c00004000010090 MCGSTATUS 0
MCGCAP 1000c1b APICID 20 SOCKETID 1
MICROCODE 42e
CPUID Vendor Intel Family 6 Model 62 Step 4
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 7 TSC 7b5f13bed6
MISC 152683e86 ADDR 159b42500
TIME 1666113981 Tue Oct 18 19:26:21 2022
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR
Transaction: Memory read error
STATUS 8c00004000010091 MCGSTATUS 0
MCGCAP 1000c1b APICID 0 SOCKETID 0
MICROCODE 42e
CPUID Vendor Intel Family 6 Model 62 Step 4

 

 

Now, the weird thing is: Even after I took out the RAM module, which I made out to be at CPU 0 BANK 7, the server kept crashing. Without showing anything in mcelog.

And it gets even weirder: I played around with a lot of different RAM modules. And placing them in different slots sometimes resulted in the server not being able to boot up at all. So usually I get 2 or 3 beeps but in these scenarios, the server just hung with powered up fans, no beeps at all.

 

If I place RAM in the slot P1 DIMMA1 (red in the image below), which should be the first slot to place RAM according to the motherboard manual, the server won't start at all. Same goes with other slots from CPU 2 (top row).

 

grafik.thumb.png.b2b68be6958847d8a31b40419be0b71f.png

 

 

Yesterday, I ran a Memtest for 3 passes and didn't get any error. But I read somewhere that Memtest can't detect any memory errors with ECC RAM. Is that correct?

 

So either I'm having a bunch of broken RAM (I literally have a stack of about 20 which I got off an "old" server and non of them seem to work any better) or maybe my motherboard or CPU is broken or causing problems with newer versions of Unraid? Or do you have any other ideas?

 

 

Thanks for reading and taking some time to brainstorm with me :)

 

lamebot

syslog-lamebot-181022.log

Link to comment
13 hours ago, lamebot said:

If I place RAM in the slot P1 DIMMA1 (red in the image below), which should be the first slot to place RAM according to the motherboard manual, the server won't start at all. Same goes with other slots from CPU 2 (top row).

That suggests a hardware issue, how many DIMMs do you have total? If you install just two DIMMs per CPU, using the first 2 slots, does it post?

 

 

Link to comment

Hey, thanks for the reply!

 

Recently, I was using 3 DIMMs only (2x CPU 1 and 1x CPU 2) in more or less random slots. But as I wrote before, this setup is very unstable. I have 13 other unused DIMMs of the same type. I never used all of them, because I didn't bother to demount the fans, which are blocking 3 of the slots (I also don't know if these 3 slots are working).

 

According to the motherboard manual, the preferable slots for 4 DIMMs (2 per CPU) would be P1-A1, P1-B1, P2-E1 & P2-F1. So I'm assuming these should be the "first 2 slots" for each CPU. Result: No success (no booting/no beeps).

 

So I tried some other constellations:

  • P1-A1, P1-A2, P2-E1, P2-E2: No.
  • P1-A1, P1-B1: No.
  • P1-A1: No (here I used different DIMMs)
  • P2-E1: Yes.
  • P2-E1, P2-E2: Yes.
  • P2-E1, P2-E2, P2-G1, P2-G2: Yes.

Could it just be CPU 1 that's failing, you might wonder? Unfortunately not, because:

  • P2-E1, P2-F1: No.
  • P2-E1, P2-F1, P2-H1, P2-G1: No.

Both P2-F1 and P2-H1 seem to be broken as well.

 

Sorry if this is so cryptic. I could also take some time and label the broken slots in the picture above if that would help.

In summary, random (I think) DIMM slots are preventing the server from booting, or in other words: appear as broken.

Could a power outage have caused this? I know I had some in the past.

But why, then, did the updates destabilize it so much? Pure coincidence?

 

 

I hope I'm not causing any headaches, I know I'm starting to have one 😬

 

Edited by lamebot
Link to comment

Here's a short update and a bump: I didn't touch the server for about 48 hours and changed the DIMM slots after that. So I have 4 x 4GB now. This time, the server was stable for about 30 hours before it crashed.

After this, I wasn't able to start it at all, even though I didn't touch the hardware. Then I dismounted my graphics card (because why not) and it was able to boot up again. So... weird things happening.

 

Might this be a problem with the PSU? Or maybe motherboard?

Link to comment
  • 2 weeks later...

Hmm...

 

I'm still trying to figure out some more things before I buy new hardware.

The server is a little bit more stable now after I changed around DIMM slots again. I'm using 3x 4GB for both CPUs at the moment.

It's able to run for about 3 days without crashing. RAM/CPU/Disk intense tasks still make it crash, f.e. parity checks or backups, which I disabled for the time being. Even without those, it crashes after some days.

 

Some new Hardware Error I got in my syslog:

 

Quote

Nov  5 12:02:26 Fortress kernel: {5}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
Nov  5 12:02:26 Fortress kernel: {5}[Hardware Error]: It has been corrected by h/w and requires no further action
Nov  5 12:02:26 Fortress kernel: {5}[Hardware Error]: event severity: corrected
Nov  5 12:02:26 Fortress kernel: {5}[Hardware Error]:  Error 0, type: corrected
Nov  5 12:02:26 Fortress kernel: {5}[Hardware Error]:  fru_text: CorrectedErr
Nov  5 12:02:26 Fortress kernel: {5}[Hardware Error]:   section_type: memory error
Nov  5 12:02:26 Fortress kernel: {5}[Hardware Error]:   node:0 device:0
Nov  5 12:02:26 Fortress kernel: {5}[Hardware Error]:   error_type: 2, single-bit ECC

 

Somtimes, the error occurs in combination with DIMM errors after which the server usually crashes. Today, I tried to mount a backup to recover a single file. Some days ago, I tried a parity check and got TONS of these Fallback Socket memory errors before a crash.

 

Quote

ov  5 10:58:21 Fortress kernel: {14}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
Nov  5 10:58:21 Fortress kernel: {14}[Hardware Error]: It has been corrected by h/w and requires no further action
Nov  5 10:58:21 Fortress kernel: {14}[Hardware Error]: event severity: corrected
Nov  5 10:58:21 Fortress kernel: {14}[Hardware Error]:  Error 0, type: corrected
Nov  5 10:58:21 Fortress kernel: {14}[Hardware Error]:  fru_text: CorrectedErr
Nov  5 10:58:21 Fortress kernel: {14}[Hardware Error]:   section_type: memory error
Nov  5 10:58:21 Fortress kernel: {14}[Hardware Error]:   node:0 device:0
Nov  5 10:58:21 Fortress kernel: {14}[Hardware Error]:   error_type: 2, single-bit ECC
Nov  5 10:58:21 Fortress kernel: sd 0:0:0:0: [sda] 60063744 512-byte logical blocks: (30.8 GB/28.6 GiB)
Nov  5 10:58:25 Fortress root: Fix Common Problems Version 2022.10.17
Nov  5 10:58:25 Fortress  mcelog: Fallback Socket memory error count 27985 exceeded threshold: 112568 in 24h
Nov  5 10:58:25 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:25 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:25 Fortress  mcelog: Fallback Socket memory error count 20565 exceeded threshold: 133134 in 24h
Nov  5 10:58:25 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:25 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:25 Fortress  mcelog: Fallback Socket memory error count 20565 exceeded threshold: 153700 in 24h
Nov  5 10:58:25 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:25 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:25 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:25 Fortress mcelog: Fallback Socket memory error count 20565 exceeded threshold: 133134 in 24h
Nov  5 10:58:25 Fortress mcelog: Fallback Socket memory error count 27985 exceeded threshold: 112568 in 24h
Nov  5 10:58:25 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:25 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:26 Fortress  mcelog: Fallback Socket memory error count 26322 exceeded threshold: 180023 in 24h
Nov  5 10:58:26 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:26 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:26 Fortress  mcelog: Fallback Socket memory error count 26332 exceeded threshold: 206356 in 24h
Nov  5 10:58:26 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:26 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:26 Fortress  mcelog: Fallback Socket memory error count 26332 exceeded threshold: 232689 in 24h
Nov  5 10:58:26 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:26 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:26 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:26 Fortress mcelog: Fallback Socket memory error count 26322 exceeded threshold: 180023 in 24h
Nov  5 10:58:26 Fortress mcelog: Fallback Socket memory error count 26332 exceeded threshold: 206356 in 24h
Nov  5 10:58:26 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:26 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:27 Fortress  mcelog: Fallback Socket memory error count 24498 exceeded threshold: 257188 in 24h
Nov  5 10:58:27 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:27 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:27 Fortress  mcelog: Corrected memory errors on page 7f80c000 exceed threshold 10 in 24h: 10 in 24h
Nov  5 10:58:27 Fortress  mcelog: Location SOCKET:0 CHANNEL:2 DIMM:? []
Nov  5 10:58:27 Fortress  mcelog: Fallback Socket memory error count 24480 exceeded threshold: 281669 in 24h
Nov  5 10:58:27 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:27 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:27 Fortress  mcelog: Fallback Socket memory error count 24480 exceeded threshold: 306150 in 24h
Nov  5 10:58:27 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:27 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:27 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:27 Fortress mcelog: Fallback Socket memory error count 24498 exceeded threshold: 257188 in 24h
Nov  5 10:58:27 Fortress mcelog: Fallback Socket memory error count 24480 exceeded threshold: 281669 in 24h
Nov  5 10:58:27 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:27 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:28 Fortress  mcelog: Fallback Socket memory error count 22390 exceeded threshold: 328541 in 24h
Nov  5 10:58:28 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:28 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:28 Fortress  mcelog: Fallback Socket memory error count 22392 exceeded threshold: 350934 in 24h
Nov  5 10:58:28 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:28 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:28 Fortress  mcelog: Fallback Socket memory error count 22392 exceeded threshold: 373327 in 24h
Nov  5 10:58:28 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:28 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:28 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:28 Fortress mcelog: Fallback Socket memory error count 22390 exceeded threshold: 328541 in 24h
Nov  5 10:58:28 Fortress mcelog: Fallback Socket memory error count 22392 exceeded threshold: 350934 in 24h
Nov  5 10:58:28 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:28 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:29 Fortress  mcelog: Fallback Socket memory error count 24477 exceeded threshold: 397805 in 24h
Nov  5 10:58:29 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:29 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:29 Fortress  mcelog: Fallback Socket memory error count 24476 exceeded threshold: 422282 in 24h
Nov  5 10:58:29 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:29 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:29 Fortress  mcelog: Fallback Socket memory error count 24476 exceeded threshold: 446759 in 24h
Nov  5 10:58:29 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:29 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:29 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:29 Fortress mcelog: Fallback Socket memory error count 24477 exceeded threshold: 397805 in 24h
Nov  5 10:58:29 Fortress mcelog: Fallback Socket memory error count 24476 exceeded threshold: 422282 in 24h
Nov  5 10:58:29 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:29 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:30 Fortress  mcelog: Fallback Socket memory error count 26962 exceeded threshold: 473722 in 24h
Nov  5 10:58:30 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:30 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:30 Fortress  mcelog: Fallback Socket memory error count 26969 exceeded threshold: 500692 in 24h
Nov  5 10:58:30 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:30 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:30 Fortress  mcelog: Fallback Socket memory error count 26969 exceeded threshold: 527662 in 24h
Nov  5 10:58:30 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:30 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:30 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:30 Fortress mcelog: Fallback Socket memory error count 26969 exceeded threshold: 500692 in 24h
Nov  5 10:58:30 Fortress mcelog: Fallback Socket memory error count 26962 exceeded threshold: 473722 in 24h
Nov  5 10:58:30 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:30 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:31 Fortress  mcelog: Fallback Socket memory error count 25201 exceeded threshold: 552864 in 24h
Nov  5 10:58:31 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:31 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:31 Fortress  mcelog: Fallback Socket memory error count 25197 exceeded threshold: 578062 in 24h
Nov  5 10:58:31 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:31 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:31 Fortress  mcelog: Fallback Socket memory error count 25197 exceeded threshold: 603260 in 24h
Nov  5 10:58:31 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:31 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:31 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:31 Fortress mcelog: Fallback Socket memory error count 25197 exceeded threshold: 578062 in 24h
Nov  5 10:58:31 Fortress mcelog: Fallback Socket memory error count 25201 exceeded threshold: 552864 in 24h
Nov  5 10:58:31 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:31 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:32 Fortress  mcelog: Fallback Socket memory error count 25079 exceeded threshold: 628340 in 24h
Nov  5 10:58:32 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:32 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:32 Fortress  mcelog: Fallback Socket memory error count 25070 exceeded threshold: 653411 in 24h
Nov  5 10:58:32 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:32 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:32 Fortress  mcelog: Fallback Socket memory error count 25070 exceeded threshold: 678482 in 24h
Nov  5 10:58:32 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:32 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:32 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:32 Fortress mcelog: Fallback Socket memory error count 25079 exceeded threshold: 628340 in 24h
Nov  5 10:58:32 Fortress mcelog: Fallback Socket memory error count 25070 exceeded threshold: 653411 in 24h
Nov  5 10:58:32 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:32 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:33 Fortress  mcelog: Fallback Socket memory error count 21663 exceeded threshold: 700146 in 24h
Nov  5 10:58:33 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:33 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:33 Fortress  mcelog: Fallback Socket memory error count 21655 exceeded threshold: 721802 in 24h
Nov  5 10:58:33 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:33 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:33 Fortress  mcelog: Fallback Socket memory error count 21655 exceeded threshold: 743458 in 24h
Nov  5 10:58:33 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:33 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:33 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:33 Fortress mcelog: Fallback Socket memory error count 21655 exceeded threshold: 721802 in 24h
Nov  5 10:58:33 Fortress mcelog: Fallback Socket memory error count 21663 exceeded threshold: 700146 in 24h
Nov  5 10:58:33 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:33 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:34 Fortress  mcelog: Fallback Socket memory error count 19788 exceeded threshold: 763247 in 24h
Nov  5 10:58:34 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:34 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:34 Fortress  mcelog: Fallback Socket memory error count 19788 exceeded threshold: 783036 in 24h
Nov  5 10:58:34 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:34 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:34 Fortress  mcelog: Fallback Socket memory error count 19788 exceeded threshold: 802825 in 24h
Nov  5 10:58:34 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:34 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:34 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:34 Fortress mcelog: Fallback Socket memory error count 19788 exceeded threshold: 763247 in 24h
Nov  5 10:58:34 Fortress mcelog: Fallback Socket memory error count 19788 exceeded threshold: 783036 in 24h
Nov  5 10:58:34 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:34 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:35 Fortress  mcelog: Fallback Socket memory error count 21790 exceeded threshold: 824616 in 24h
Nov  5 10:58:35 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:35 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:35 Fortress  mcelog: Fallback Socket memory error count 21788 exceeded threshold: 846405 in 24h
Nov  5 10:58:35 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:35 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:35 Fortress  mcelog: Fallback Socket memory error count 21788 exceeded threshold: 868194 in 24h
Nov  5 10:58:35 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:35 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:35 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:35 Fortress mcelog: Fallback Socket memory error count 21790 exceeded threshold: 824616 in 24h
Nov  5 10:58:35 Fortress mcelog: Fallback Socket memory error count 21788 exceeded threshold: 846405 in 24h
Nov  5 10:58:35 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:35 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:36 Fortress  mcelog: Fallback Socket memory error count 20755 exceeded threshold: 888950 in 24h
Nov  5 10:58:36 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:36 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:36 Fortress  mcelog: Fallback Socket memory error count 20750 exceeded threshold: 909701 in 24h
Nov  5 10:58:36 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:36 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:36 Fortress  mcelog: Fallback Socket memory error count 20750 exceeded threshold: 930452 in 24h
Nov  5 10:58:36 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:36 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:36 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:36 Fortress mcelog: Fallback Socket memory error count 20750 exceeded threshold: 909701 in 24h
Nov  5 10:58:36 Fortress mcelog: Fallback Socket memory error count 20755 exceeded threshold: 888950 in 24h
Nov  5 10:58:36 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:36 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:37 Fortress  mcelog: Fallback Socket memory error count 22430 exceeded threshold: 952883 in 24h
Nov  5 10:58:37 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:37 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:37 Fortress  mcelog: Fallback Socket memory error count 22436 exceeded threshold: 975320 in 24h
Nov  5 10:58:37 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:37 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:37 Fortress  mcelog: Fallback Socket memory error count 22436 exceeded threshold: 997757 in 24h
Nov  5 10:58:37 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:37 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:37 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:37 Fortress mcelog: Fallback Socket memory error count 22430 exceeded threshold: 952883 in 24h
Nov  5 10:58:37 Fortress mcelog: Fallback Socket memory error count 22436 exceeded threshold: 975320 in 24h
Nov  5 10:58:37 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:37 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:38 Fortress  mcelog: Fallback Socket memory error count 24321 exceeded threshold: 1022079 in 24h
Nov  5 10:58:38 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:38 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:38 Fortress  mcelog: Fallback Socket memory error count 24328 exceeded threshold: 1046408 in 24h
Nov  5 10:58:38 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:38 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:38 Fortress  mcelog: Fallback Socket memory error count 24328 exceeded threshold: 1070737 in 24h
Nov  5 10:58:38 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:38 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:38 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:38 Fortress mcelog: Fallback Socket memory error count 24321 exceeded threshold: 1022079 in 24h
Nov  5 10:58:38 Fortress mcelog: Fallback Socket memory error count 24328 exceeded threshold: 1046408 in 24h
Nov  5 10:58:38 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:38 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:39 Fortress  mcelog: Fallback Socket memory error count 20674 exceeded threshold: 1091412 in 24h
Nov  5 10:58:39 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:39 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:39 Fortress  mcelog: Fallback Socket memory error count 20669 exceeded threshold: 1112082 in 24h
Nov  5 10:58:39 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:39 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:39 Fortress  mcelog: Fallback Socket memory error count 20669 exceeded threshold: 1132752 in 24h
Nov  5 10:58:39 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:39 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:39 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:39 Fortress mcelog: Fallback Socket memory error count 20669 exceeded threshold: 1112082 in 24h
Nov  5 10:58:39 Fortress mcelog: Fallback Socket memory error count 20674 exceeded threshold: 1091412 in 24h
Nov  5 10:58:39 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:39 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:40 Fortress  mcelog: Fallback Socket memory error count 22584 exceeded threshold: 1155337 in 24h
Nov  5 10:58:40 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:40 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:40 Fortress  mcelog: Fallback Socket memory error count 22581 exceeded threshold: 1177919 in 24h
Nov  5 10:58:40 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:40 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:40 Fortress  mcelog: Fallback Socket memory error count 22581 exceeded threshold: 1200501 in 24h
Nov  5 10:58:40 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:40 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:40 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:40 Fortress mcelog: Fallback Socket memory error count 22584 exceeded threshold: 1155337 in 24h
Nov  5 10:58:40 Fortress mcelog: Fallback Socket memory error count 22581 exceeded threshold: 1177919 in 24h
Nov  5 10:58:40 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:40 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:41 Fortress  mcelog: Fallback Socket memory error count 22621 exceeded threshold: 1223123 in 24h
Nov  5 10:58:41 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:41 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:41 Fortress  mcelog: Fallback Socket memory error count 22618 exceeded threshold: 1245742 in 24h
Nov  5 10:58:41 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:41 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:41 Fortress  mcelog: Fallback Socket memory error count 22618 exceeded threshold: 1268361 in 24h
Nov  5 10:58:41 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:41 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:41 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:41 Fortress mcelog: Fallback Socket memory error count 22618 exceeded threshold: 1245742 in 24h
Nov  5 10:58:41 Fortress mcelog: Fallback Socket memory error count 22621 exceeded threshold: 1223123 in 24h
Nov  5 10:58:41 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:41 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:42 Fortress  mcelog: Fallback Socket memory error count 21596 exceeded threshold: 1289958 in 24h
Nov  5 10:58:42 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:42 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:42 Fortress  mcelog: Fallback Socket memory error count 21578 exceeded threshold: 1311537 in 24h
Nov  5 10:58:42 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:42 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:42 Fortress  mcelog: Fallback Socket memory error count 21578 exceeded threshold: 1333116 in 24h
Nov  5 10:58:42 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:42 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:42 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:42 Fortress mcelog: Fallback Socket memory error count 21596 exceeded threshold: 1289958 in 24h
Nov  5 10:58:42 Fortress mcelog: Fallback Socket memory error count 21578 exceeded threshold: 1311537 in 24h
Nov  5 10:58:42 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:42 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:43 Fortress  mcelog: Fallback Socket memory error count 22171 exceeded threshold: 1355288 in 24h
Nov  5 10:58:43 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:43 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:43 Fortress  mcelog: Fallback Socket memory error count 22159 exceeded threshold: 1377448 in 24h
Nov  5 10:58:43 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:43 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:43 Fortress  mcelog: Fallback Socket memory error count 22159 exceeded threshold: 1399608 in 24h
Nov  5 10:58:43 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:43 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:43 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:43 Fortress mcelog: Fallback Socket memory error count 22171 exceeded threshold: 1355288 in 24h
Nov  5 10:58:43 Fortress mcelog: Fallback Socket memory error count 22159 exceeded threshold: 1377448 in 24h
Nov  5 10:58:43 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:43 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:44 Fortress  mcelog: Fallback Socket memory error count 19833 exceeded threshold: 1419442 in 24h
Nov  5 10:58:44 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:44 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:44 Fortress  mcelog: Fallback Socket memory error count 19837 exceeded threshold: 1439280 in 24h
Nov  5 10:58:44 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:44 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:44 Fortress  mcelog: Fallback Socket memory error count 19837 exceeded threshold: 1459118 in 24h
Nov  5 10:58:44 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:44 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:44 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:44 Fortress mcelog: Fallback Socket memory error count 19837 exceeded threshold: 1439280 in 24h
Nov  5 10:58:44 Fortress mcelog: Fallback Socket memory error count 19833 exceeded threshold: 1419442 in 24h
Nov  5 10:58:44 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:44 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:45 Fortress  mcelog: Fallback Socket memory error count 23230 exceeded threshold: 1482349 in 24h
Nov  5 10:58:45 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:45 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:45 Fortress  mcelog: Fallback Socket memory error count 23229 exceeded threshold: 1505579 in 24h
Nov  5 10:58:45 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:45 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:45 Fortress  mcelog: Fallback Socket memory error count 23229 exceeded threshold: 1528809 in 24h
Nov  5 10:58:45 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:45 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:45 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:45 Fortress mcelog: Fallback Socket memory error count 23230 exceeded threshold: 1482349 in 24h
Nov  5 10:58:45 Fortress mcelog: Fallback Socket memory error count 23229 exceeded threshold: 1505579 in 24h
Nov  5 10:58:45 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:45 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:45 Fortress  mcelog: Cannot collect child 24213: No child processes
Nov  5 10:58:46 Fortress  mcelog: Fallback Socket memory error count 20974 exceeded threshold: 1549784 in 24h
Nov  5 10:58:46 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:46 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:46 Fortress  mcelog: Corrected memory errors on page 2665000 exceed threshold 30 in 24h: 30 in 24h
Nov  5 10:58:46 Fortress  mcelog: Location SOCKET:0 CHANNEL:2 DIMM:? []
Nov  5 10:58:46 Fortress  mcelog: Fallback Socket memory error count 20969 exceeded threshold: 1570754 in 24h
Nov  5 10:58:46 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:46 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:46 Fortress  mcelog: Fallback Socket memory error count 20969 exceeded threshold: 1591724 in 24h
Nov  5 10:58:46 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:46 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:46 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:46 Fortress mcelog: Fallback Socket memory error count 20974 exceeded threshold: 1549784 in 24h
Nov  5 10:58:46 Fortress mcelog: Fallback Socket memory error count 20969 exceeded threshold: 1570754 in 24h
Nov  5 10:58:46 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:46 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:47 Fortress  mcelog: Fallback Socket memory error count 22165 exceeded threshold: 1613890 in 24h
Nov  5 10:58:47 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:47 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:47 Fortress  mcelog: Fallback Socket memory error count 22163 exceeded threshold: 1636054 in 24h
Nov  5 10:58:47 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:47 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:47 Fortress  mcelog: Fallback Socket memory error count 22163 exceeded threshold: 1658218 in 24h
Nov  5 10:58:47 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:47 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:47 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:47 Fortress mcelog: Fallback Socket memory error count 22163 exceeded threshold: 1636054 in 24h
Nov  5 10:58:47 Fortress mcelog: Fallback Socket memory error count 22165 exceeded threshold: 1613890 in 24h
Nov  5 10:58:47 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:47 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:48 Fortress  mcelog: Fallback Socket memory error count 22534 exceeded threshold: 1680753 in 24h
Nov  5 10:58:48 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:48 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:48 Fortress  mcelog: Fallback Socket memory error count 22526 exceeded threshold: 1703280 in 24h
Nov  5 10:58:48 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:48 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:48 Fortress  mcelog: Fallback Socket memory error count 22526 exceeded threshold: 1725807 in 24h
Nov  5 10:58:48 Fortress  mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:48 Fortress  mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Nov  5 10:58:48 Fortress  mcelog: Too many trigger children running already
Nov  5 10:58:48 Fortress mcelog: Fallback Socket memory error count 22534 exceeded threshold: 1680753 in 24h
Nov  5 10:58:48 Fortress mcelog: Fallback Socket memory error count 22526 exceeded threshold: 1703280 in 24h
Nov  5 10:58:48 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []
Nov  5 10:58:48 Fortress mcelog: Location: SOCKET:0 CHANNEL:? DIMM:? []

 

Yesterday, I had another very strange problem: The server fans howled very loud every few minutes. I let it running during the day and returned to a running server in the evening, still doing it's howling. It was like a wounded animal begging me to put it out of its misery 😅

It crashed later that evening and the problem stopped today, after I removed two DIMMs and went back to 3x 4 GBs for both CPUs.

 

Can anyone help me narrow this problem down? Could it be a broken memory controller? How can I find out whether it's mainboard or CPU related?

Considering the error messages, I don't think it's a faulty PSU anymore (the server would probably crash more instantly, right?).

 

 

Thanks for reading and have a great weekend

Edited by lamebot
Link to comment
  • Solution

Hello, 

 

What bios version you using? Do you check BMC EventLog?

This issue probably Memory Ecc Error , You have to use Memtest86 Pro to Inject Ecc Function, verify your memory all work perfect. (Not memtest86)

 

And Server Motherboard dont recommend install memory 3 6 9 this multiple.

 

Link to comment
  • 2 weeks later...

Hi

 

I was using the latest BIOS and checked BMC Event Log sometime ago, no new errors.

However, I found out what the problem was: RAM. I couldn't believe it but it seems like most or all of my 16 DIMMs (some of which I didn't even use before) are broken or not compatibale with newer versions of Unraid. Maybe the problem was there all the time but it became way more visible because of system changes. Don't know if that makes sense.

After I installed differnt RAM, my system was running without problems. Well, almost no problems.

 

The problem with the fans spinning up like crazy came back but I was able to fix it with some IPMI changes, which I found in some other threads.

To be more specific, I had to lower the fan thresholds to 150/75/0 and changed the fan control to only use CPU2 temp and HDD temp, because at least two other sensors are doing weird things sometimes, like dropping to 0.

 

So both of you were right pointing out that it still might be RAM. Thanks for the help :)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...