February 9, 20233 yr I am getting weekly MCE errors and my system is rebooting and starts a parity check. This is less than optimal and keeps my system from spinning down disks. Feb 8 21:27:16 Tower mcelog: failed to prefill DIMM database from DMI data Feb 8 21:27:16 Tower mcelog: Kernel does not support page offline interface Feb 8 21:27:16 Tower mcelog: Running trigger `unknown-error-trigger' (reporter: unknown) Feb 8 21:27:16 Tower mcelog: CPU 1 on socket 0 received unknown error Feb 8 21:27:16 Tower mcelog: Location: CPU 1 on socket 0 Based on this I am wondering if one of the DIMMs is defective. But i see in other posts people say this does not indicate an issue and is harmless. syslog.txt
February 9, 20233 yr Author 11 hours ago, trurl said: Attach diagnostics to your NEXT post in this thread. Here is the diagnostic. Sorry thought I had attached it but apparently i just attached the syslog. tower-diagnostics-20230209-0942.zip
February 9, 20233 yr Community Expert In addition to MCE, you also have btrfs csum errors on cache. You should definitely run memtest immediately.
February 9, 20233 yr Author 1 hour ago, trurl said: In addition to MCE, you also have btrfs csum errors on cache. You should definitely run memtest immediately. Does it indicate which drive in the cache? I have two NVME's in a mirrored cache.
February 9, 20233 yr Community Expert 3 minutes ago, x31337x said: Does it indicate which drive in the cache? I have two NVME's in a mirrored cache. Feb 8 21:28:50 Tower kernel: BTRFS warning (device nvme1n1p1): csum failed root -9 ino 257 off 907026432 csum 0xbea1b5f9 expected csum 0xbe21b5f9 mirror 2 Feb 8 21:28:50 Tower kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 570, gen 0 and more. 1 hour ago, trurl said: You should definitely run memtest immediately.
February 9, 20233 yr Author 1 hour ago, trurl said: In addition to MCE, you also have btrfs csum errors on cache. You should definitely run memtest immediately. Running memtest now.
February 21, 20233 yr Author On 2/9/2023 at 9:57 AM, trurl said: In addition to MCE, you also have btrfs csum errors on cache. You should definitely run memtest immediately. I ran memtest and it did not indicate any issues. I also swapped out the memory with new memory and I am still getting the unclean shutdowns. Could it be the motherboard or the CPU? I have a spare Motherboard I can swap in.
February 21, 20233 yr Community Expert Maybe the btrfs csum errors are from previous bad RAM. You need to clean that up. There is a whole thread pinned near the top of this General Support subforum about unclean shutdowns. Maybe some ideas there.
February 22, 20233 yr Author 17 hours ago, trurl said: Maybe the btrfs csum errors are from previous bad RAM. You need to clean that up. There is a whole thread pinned near the top of this General Support subforum about unclean shutdowns. Maybe some ideas there. Ok I will give that a look.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.