plindberg

Members
  • Posts

    6
  • Joined

  • Last visited

plindberg's Achievements

Noob

Noob (1/14)

1

Reputation

  1. In case anyone else has these same issues in the future, the changes discussed above seems to have done the trick. My system has been running stable for a week now, no signs of any problems. To sum it up, what I did was follow the suggestions by the user Marshy in the thread linked above: - Set the RAM speed to 2666Mhz - Disable global c-states - Set Power Supply Idle Control to Typical I'm guessing the latter two didn't have anything to do with the ECC errors, but a lot of people seems to be reporting better system stability with those settings.
  2. Nearly missed your edit! Judging from the posts by Marshy on page 4 of the first thread you linked, it seems that setting the memory speed to 2666hz rather than 2400hz might actually fix the issue. I'm going to try this and see if I get any more errors. These sticks are supposed to run at that speed anyway, not quite sure why they defaulted to 2400. If that doesn't help I'll start digging into the second thread. Should be a good Sunday read ๐Ÿ˜
  3. Thank you, going to take a look! ๐Ÿ™‡โ€โ™‚๏ธ
  4. There's always underclocking ๐Ÿ˜ Jokes aside, I'm going to keep monitoring this and see how extensive the errors get. If it continues happening on a regular basis I will probably just order some new sticks. If I don't see any more I will just write it off as a random event caused by cosmic rays or whatever, and be happy that the ECC did its job.
  5. That's good to know. If any errors do occur they should be logged to the system log though, so at least there's that. I've read elsewhere that if the memory is not setup correctly in BIOS, that could trigger ECC errors. Any thoughts on that? Seems unlikely that both me and the OP got bad sticks, but I guess there might have been a bad batch.
  6. I have the same setup as you, - Ryzen 7 2700 - Asrock Rack X470D4U - 2x Kingston 16GB ECC DDR4 (KSM26ED8/16ME). OS is Debian. I assembled this system today and have seen three corrected ECC errors in about 4 hours of operation, seems to be the same errors as you've reported here. All three of mine are on different addresses, though. Currently trying to figure out if this is normal. I also noticed that my RAM is running at 2400 and not 2666. This is according to dmidecode. May be interesting to note that all three errors happened just as I was running a command in the terminal. One of them happened when I ran sensors-detect, another when I terminated a running stress-test. I don't recall what I was doing when the third one occurred. Since then I've been running memory stress tests using stress-ng to see if I can trigger more errors (going on for about one or two hours now), but haven't seen anything yet. I also ran a few loops of memtester and everything passed. The fact that we seem to have the exact same issue (if it even is an issue, I'm still holding out on that) on virtually identical systems makes me believe it could be related to some configuration issue. I have left all BIOS settings to their defaults for now - maybe the memory configuration is not optimal for these sticks? Maybe a BIOS update is in order. Log entries: [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:0 (17:8:2) MC16_STATUS[-|CE|MiscV|-|AddrV|-|-|SyndV|-|CECC]: 0x9c2040000000011b [Hardware Error]: Error Addr: 0x00000000018f03c0 [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x00000a400a400103 [Hardware Error]: Unified Memory Controller Extended Error Code: 0 [Hardware Error]: Unified Memory Controller Error: DRAM ECC error. [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€” [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:0 (17:8:2) MC16_STATUS[Over|CE|MiscV|-|AddrV|-|-|SyndV|-|CECC]: 0xdc2040000000011b [Hardware Error]: Error Addr: 0x00000003e7eb9800 [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x00000a400a400103 [Hardware Error]: Unified Memory Controller Extended Error Code: 0 [Hardware Error]: Unified Memory Controller Error: DRAM ECC error. [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€” [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:0 (17:8:2) MC16_STATUS[Over|CE|MiscV|-|AddrV|-|-|SyndV|-|CECC]: 0xdc2040000000011b [Hardware Error]: Error Addr: 0x0000000000c10980 [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x00000a400a400102 [Hardware Error]: Unified Memory Controller Extended Error Code: 0 [Hardware Error]: Unified Memory Controller Error: DRAM ECC error. [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD