Jump to content

Docker / Libvirt Service Failed to Start


Recommended Posts

Hi.

 

I shut down my unRAID server to install a new storage drive. Booted back up, cleared the disk, formatted. All good. Then I went to go to my Docker tab and get met with "Docker Service failed to start", likewise for the VM tab. I checked the syslog and found some concerning messages relating to BTRFS.

 

No write errors reported on the cache disks. SMART test came back clean. Check filesystem status did not look good, however. I've attached the output of that, alongside the diagnostics file.

 

My first route of troubleshooting was to undo everything I had done regarding adding a new storage drive. I'm pretty sure that didn't really have anything to do with the issue, but both Docker and Libvirt were working just fine immediately prior to adding the storage drive, so whatever. I went through the "Shrink array" process as documented on the Wiki, cleared the new storage drive and removed it from the array. Everything on the storage side is still fine, but nothing has changed with regards to Docker and Libvirt, still broken.

 

From my Google searches, the general consensus seems to be that I'll need to basically redo the cache and all that stuff, but I wanted to get a second opinion before I go any further and either break more stuff or go through the headache of unnecessarily redo'ing all of my Docker containers and virtual machines, which would be quite the inconvenience. But, if that's what I gotta do then so be it. From what I read, it seems like this is likely to happen again if there's a hardware issue with the cache drives. I have two of them which I thought was supposed to be for redundancy but clearly that hasn't helped me in this situation so hopefully if there is an issue it's just one of them and I can still use the second one. Though, if there was a hardware issue I would have expected to see more/different errors. But I don't know what I'm looking for exactly.

 

I've checked all cables, swapped sata cables, etc. to no effect.

 

Thank you for your time.

cortana-diagnostics-20220416-2007.zip btrfs check status.txt

Link to comment
44 minutes ago, Squid said:

Thanks. Will this help with the Libvirt (service failed to start) issue as well or just the Docker issue? Should I be worried about the abundance of issues reported in the filesystem check? I'm worried that something else is messed up which caused the docker image to become corrupt and cause me to have to go through this process again in the future. Or is it the corrupt Docker image that has caused a cascade of issues both with the Libvirt service and the issues in the filesystem check and redoing it will resolve all of the above mentioned issues? Sorry for the barrage of questions.

Link to comment
10 hours ago, Squid said:

I didn't notice all that.  You'll need to fix the underlying cache drive issues first

Was there any indication of what the underlying issue might be in the files I provided? If there's anything else that I can provide to help diagnose the issue, please do let me know.

Link to comment
On 4/18/2022 at 12:33 AM, Clobes said:

what the underlying issue might be

Apr 16 19:47:38 Cortana kernel: BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 169216630, rd 124947440, flush 2261207, corrupt 988649, gen 0

 

The large number of read and write errors suggest this device dropped offline some time earlier, you should run a scrub and make sure all errors are correctable, then suggest you monitor the pool for future issues, bore info below:

 

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

 

 

Link to comment
11 hours ago, JorgeB said:
Apr 16 19:47:38 Cortana kernel: BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 169216630, rd 124947440, flush 2261207, corrupt 988649, gen 0

 

The large number of read and write errors suggest this device dropped offline some time earlier, you should run a scrub and make sure all errors are correctable, then suggest you monitor the pool for future issues, bore info below:

 

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

 

 

JorgeB, thank you so much!

 

I ran the scrub and it showed that everything was corrected, zero uncorrectable. I've got the script in place to check for and notify of any errors hourly.

At this point, is there anything else I need to do to get Docker and Libvirt working again or do I just need to redo those, now with the peace of mind that cache errors are corrected? I'm still getting "Service failed to start" for both services, so I tried rebooting and disabling/reenabling both but no change there. Syslog attached in case that's helpful. It's still showing some concerning messages following the reboot and service restart.

 

Apr 19 15:51:40 Cortana emhttpd: shcmd (126): /usr/local/sbin/mount_image '/mnt/user/system/libvirt/libvirt.img' /etc/libvirt 1
Apr 19 15:51:40 Cortana kernel: BTRFS: device fsid 50562063-4cd9-473d-85b3-520dc684d738 devid 1 transid 1440 /dev/loop2 scanned by udevd (7284)
Apr 19 15:51:40 Cortana kernel: BTRFS info (device loop2): using free space tree
Apr 19 15:51:40 Cortana kernel: BTRFS info (device loop2): has skinny extents
Apr 19 15:51:40 Cortana kernel: BTRFS error (device loop2): bad tree block start, want 33767424 have 0
Apr 19 15:51:40 Cortana kernel: BTRFS error (device loop2): bad tree block start, want 33767424 have 0
Apr 19 15:51:40 Cortana kernel: BTRFS warning (device loop2): couldn't read tree root
Apr 19 15:51:40 Cortana root: mount: /etc/libvirt: wrong fs type, bad option, bad superblock on /dev/loop2, missing codepage or helper program, or other error.
Apr 19 15:51:40 Cortana kernel: BTRFS error (device loop2): open_ctree failed
Apr 19 15:51:40 Cortana root: mount error
Apr 19 15:51:40 Cortana emhttpd: shcmd (126): exit status: 1

 

 

Thank you again for your help with this.

syslog-192.168.1.103.zip

Link to comment
  • 5 months later...
On 4/20/2022 at 2:48 PM, JorgeB said:

libvirt image is corrupt, you need to restore from backups, by default Unraid disables checksums for the system share, so if there's corruption it can't be fixed.

I suffered from this situation too, where can I find the backup of libvirt image?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...