Jump to content

Docker and VM engine failed to start


Luc1fer

Recommended Posts

I am having problems with my server.  I did 2 things.

1) I upgrade to the latest version

2) I noticed that one of my Cache SSDs had dropped off, so while the server was shutdown I replaced the SATA cables.

 

When I powered back up both of the SSDs were there so I thought great, but now the docker engine wont start up.  It just says "Docker Service failed to start."  And the VM engine wont start up saying "Libvirt Service failed to start."

 

I reverted my upgrade, but it has not made any difference.

 

I suspect that there is something going on with the fact that one of my cache drives dropped off then came back again.  I'm not sure how long it had been disconnected for.  I'm not sure where to go from here.

 

Any help would be appreciated.

 

Cheers,

L.

 

chenbro-svr-diagnostics-20181221-1633.zip

Link to comment

Because there were read/write errors on one of the cache devices:

Dec 21 16:17:54 Chenbro-Svr kernel: BTRFS info (device sdp1): bdev /dev/sdq1 errs: wr 3714141, rd 2077762, flush 152140, corrupt 0, gen 0

docker and VM images are corrupt, run a scrub on the pool and recreate the docker image, libvirt restore from a backup or recreate but you'll lose the VMs config, after that see here to monitor the pool for errors:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

 

Link to comment

Thanks for that.  I have pulled out the cache drive that was giving all the errors and everything appears normal at the moment.  If my dockers and VMs are working do I still need to rebuild them? I'll run a scrub on the pool and see how that goes.  What's the best way to check the SSD that gave me the errors?  Am I best to just install it in a PC and test it there?

 

Seeing as I got lucky this time, how do I backup my Docker.img and libvirt?

Link to comment
50 minutes ago, Luc1fer said:

I have pulled out the cache drive that was giving all the errors and everything appears normal at the moment.  If my dockers and VMs are working do I still need to rebuild them?

No, what happens here is that NOCOW shares, and the system share is NOCOW by default,  aren't checksummed and can't be corrected by btrfs since it doesn't know there's a problem, so with both SSDs connected btrfs will read alternatively from both and return bad data from the device that previously dropped offline and now contains stale data, with that one removed it can only read from the good device, so if everything is working no need to rebuild and you can later re-add the other device as a new device so the mirror gets rebuilt.

 

53 minutes ago, Luc1fer said:

What's the best way to check the SSD that gave me the errors?

Most times with SSDs it's a bad cable, replace both cables and keep monitoring.

 

56 minutes ago, Luc1fer said:

Seeing as I got lucky this time, how do I backup my Docker.img and libvirt?

Not worth to backup the docker image since it's easily rebuilt without losing any data, on the other hand you should backup libvrit.img, manually or you can use for example the CA Appdata backup plugin.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...