Jump to content

Docker stopped working.


Recommended Posts

Diagnostics attached. I woke up today to Docker containers not working. If I tried to stop/restart any container, I got "Execution Error".  I tried stopping Docker, deleting the vdisk, and starting Docker back up. Now, the Docker page says Docker failed to start.

 

No idea what's going on. The server was working great last night. Never turned it off. Now it's borked.

 

sanctuary-diagnostics-20220316-1131.zip

Edited by Hollandex
Link to comment

Btrfs detected data corruption in both devices

 

Mar 16 11:12:23 Sanctuary kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 65, gen 0
Mar 16 11:12:23 Sanctuary kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 46, gen 0

 

This is usually a RAM issue, start by running memtest, since the filesystem was also affected if a problem if found best bet after fixing it is to backup and re-format the pool.

Link to comment

I rebooted to no avail but I then shut the system down entirely so I could mess with the components inside the case. When I started it back up, everything is working fine again. Docker starts, VMs work, etc.

 

So....does this still sound lime a RAM problem? I'll probably take the server offline tonight and let memtest run while I sleep.

 

Edit: Spoke too soon. It started to work. Now it's all failing again. In the same way.

Edited by Hollandex
Link to comment
56 minutes ago, JorgeB said:

Btrfs detected data corruption in both devices

 

Mar 16 11:12:23 Sanctuary kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 65, gen 0
Mar 16 11:12:23 Sanctuary kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 46, gen 0

 

This is usually a RAM issue, start by running memtest, since the filesystem was also affected if a problem if found best bet after fixing it is to backup and re-format the pool.

 

Okay, running memtest now. If the problem persists, since the cache is acting as if it's read only, will Mover be able to correctly move the contents off the cache and on to the array? Or should I just manually do a copy/paste from /mnt/cache to /mnt/disk1 (for instance)?

 

I think I'll format the cache pool either way, just to be safe, so I wanted to make sure I get appdata/domains/system files properly backed up.

Link to comment

UPDATE

I ran 8 passes of MemTest, across ~14 hours. Zero errors.  I might try more passes later but, for now, I'm satisfied there aren't any issues with the RAM.

 

I ran an extended self test on both NVMEs. No issues found.

 

So, at this point, I have no idea why or how my btrfs pool got corrupted. Which kind of sucks. I'd love to pinpoint a reason so I can feel assured it won't happen again.

 

I formatted each drive as XFS then put them back in a btrfs pool (this was the only way I could get Unraid to let me format the drives from btrfs to btrfs). I nuked my docker vdisk, just in case it was the culprit. And now everything is back to running like normal.

 

Thanks to both of you, Squid and JorgeB, for the help!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...