BTRFS Issue

Smallz · August 27, 2018

My server has been humming along happily without issue for over a year, now all of a sudden I'm getting messages that my docker.img is corrupt. (BTRFS critical (device loop2): corrupt leaf, bad key order: block=1004027904, root=1, slot=76). My entire system will lock up and I'll just see that previous message over and over on my console and syslog. I thought at first it was an issue with a bad SSD, however I am still seeing the issue after creating a new docker.img on one of my mechanical disks. The issue seems to crop back up after a few days, causing me to wipe my docker and start over. Any ideas?

Created new docker.img
Tried new Disk
Ran a Memtest

trurl · August 27, 2018

Tools - Diagnostics, post complete zip, preferably during the issue.

Smallz · August 30, 2018

Failures seem to be happening more frequently, and I am no longer seeing FS corruption errors in the console or logs instead it is just locking up with no error. I do see the following error in my out-of-band chassis management "OS Stop Run-Time Critical Stop - Assertion". Just now my Plex stopped functioning, though I can see the docker still running and log messages so I pulled a docker. I would really appreciate it if you took a look. Just to recap, from what I can tell it's not the drive, not the memory, and from what I can tell it's not a temperature issue (No temperature sensor alerts in my chassis management). I'm thinking Plex and docker.img corruption is just a symptom of unclean shutdown due to lockups.

evav-diagnostics-20180829-2012.zip

trurl · August 30, 2018

I notice you have 32G docker img. Did you make it that large to see if it fixes your problem? Are you sure you haven't just been filling it up?

I also noticed a FCP warning that some of appdata is on the array. You should set appdata to cache-prefer, stop docker service, and run mover manually to get that cleared up.

Smallz · August 30, 2018

First off thanks for taking a look.
The docker image is that large for an issue I had a while back where one of my dockers was filling up the img due to a misconfigured docker. That issue was resolved, docker image is 40G with 8.7G being used. I moved the appdata to the array because I originally thought the SSD was malfunctioning, I can probably go ahead and move appdata back to the cache drive. Last night when my system locked up again I only started my bare minimum dockers (Plex, NZBGet, Sonarr, Radarr), so far I haven't locked up yet. Is it possible a bad docker could lock up the entire host system? I would probably have to wait at least another 24 hours to see if it locks up with the dockers I have. Are there any debug levels I can increase, I'm thinking maybe I should SYSLOG everything to a remote host with some debugging on for a few days, see if I can catch an error. As I said previously the last few times it locked up I didn't see any messages in the console.

BTRFS Issue

Recommended Posts

Smallz

Link to comment

trurl

Link to comment

Smallz

Link to comment

trurl

Link to comment

Smallz

Link to comment

Archived