August 10, 20232 yr Over the last few weeks I've been having increasing problems with my server. First it would lock up entirely and require a hard shutdown and reboot. It happened a few times and initial testing had me believe the memory was starting to fail. I replaced it but then my docker containers would start to freeze. I mostly use Emby and was starting to see read-only permissions errors, so I deleted and recreated my docker image. The problem is persisting though, and I have no idea why. The only thing I think I'm noticing is that it freezes while parity check is running, which I know shouldn't be a real problem as I've run it on the monthly schedule for years now without any interruption other than mild performance issues. I was in tstark-diagnostics-20230810-1645.ziphe process of preparing it to a move into a new system, but can't seem to nail down the problem. My diagnostic logs are attached, and my system is currently completing a parity check again. So far 0 errors, @ ~45% with another 12 hours or so to go. Any help would be appreciated. I'm very tempted to reset the entire environment and start from scratch, but I'd really prefer to avoid that.
August 10, 20232 yr Aug 10 16:29:44 Stark kernel: BTRFS critical (device sdd1): corrupt leaf: block=5451645927424 slot=57 extent bytenr=5341179084800 len=16384 unknown inline ref type: 255 First thing to do is run Memtest from the boot menu for at least a couple of passes as corruption is usually caused by bad memory. If you're currently booting via UEFI you will have to temporarily switch to legacy boot or setup a new flash drive from https://www.memtest86.com/
August 10, 20232 yr Author Thanks. It makes me feel a bit better to find out I was on the right path. At this point it seems the board or CPU are failing. The memory that is installed is less than a week old.
August 10, 20232 yr 33 minutes ago, Nomar1245 said: The memory that is installed is less than a week old. New doesn't mean good. The first thing you need to do with new memory is a memtest.
August 10, 20232 yr Author While I understand what your saying, the odds of the same exact problem happening before and after replacing the RAM has to be astronomical.
August 10, 20232 yr 2 minutes ago, Nomar1245 said: While I understand what your saying, the odds of the same exact problem happening before and after replacing the RAM has to be astronomical. memtest also checks the memory controller and the data path. It would still be a good test. If it fails you have a smoking gun to investigate, even if it turns out all your RAM sticks are good, the memtest could still fail. Memory timing or voltage in BIOS could be wrong, or just not stable at current values. If memtest passes 24 hours with no errors, then you have another data point to help you diagnose things.
August 10, 20232 yr Author Well there we are on the same page. I’ve been running the test for a bit more than an hour now.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.