Docker service failed to start + Libvirt service failed to start

livingonline8 · September 27, 2019

So I went to sleep while my server was downloading large files using Deluge. I woke up this morning and there was a warning that my 1 TB cache drive is 91% utilized to I clicked on mover to initiate it. Then, I saw a warning in the Fix Common Problems app that my log folder or file is full, a restart will fix this....

So, I made my biggest mistake which is I rebooted the array while mover was still running, after the restart... I have no Duckers because the docker page says "Docker service failed to start"... also, I actually use no VM's but in the VM page it also says "Libvirt failed to start"

My cache drive is still 860 GB used out of 940 GB and whenever I click on mover now, nothing happens that Fix Common Problems app tells me " Unable to write to cache .... Drive mounted read only or completely full"

I have attached my Syslog & the Cache Disk information Log as well....

I am seriously freaking out

Please help me !!

Syslog and cache disk log information.zip

JorgeB · September 27, 2019

Next time please post complete diagnostics.

One of the cache devices (cache1) has been having (and still is) read/write errors:

Sep 27 11:21:29 Tower kernel: BTRFS info (device sdc1): bdev /dev/sdb1 errs: wr 26504, rd 62595, flush 0, corrupt 0, gen 0

See here for the future to better monitor pool.

Replace cables on that device, power back on, try running a scrub on the pool and post new diags, but likely pool filesystem is corrupted.

livingonline8 · September 27, 2019

14 minutes ago, johnnie.black said:
Next time please post complete diagnostics.

One of the cache devices (cache1) has been having (and still is) read/write errors:
Sep 27 11:21:29 Tower kernel: BTRFS info (device sdc1): bdev /dev/sdb1 errs: wr 26504, rd 62595, flush 0, corrupt 0, gen 0
See here for the future to better monitor pool.

Replace cables on that device, power back on, try running a scrub on the pool and post new diags, but likely pool filesystem is corrupted.

Sorry... I am really not familiar with the troubleshooting process... here is the diagnostic file before running scrubs or anything

I appreciate you having the time to look at it

tower-diagnostics-20190927-1018.zip

saarg · September 27, 2019

Better to also just start one thread instead of two identical in different subforums.

livingonline8 · September 27, 2019

I apologise, I am freaking out and I couldn't really tell what is the best subforums for it so I posted it twice to cover my basis.

I am losing my mind because of this issue and I have no idea what to do !!

Please feel free to close the other one if you believe it is not needed

I apologise again

livingonline8 · September 27, 2019

So I ran scrub and hers is what I got

scrub status for 681c0e15-d830-4a3a-9fb6-3d1521d891cd scrub started at Fri Sep 27 13:43:59 2019 and finished after 00:27:47 total bytes scrubbed: 811.84GiB with 105915583 errors error details: read=105915580 super=3 corrected errors: 0, uncorrectable errors: 105915580, unverified errors: 0

JorgeB · September 27, 2019

Assuming you also replaced the cables for cache1 please post new diags.

livingonline8 · September 27, 2019

1 minute ago, johnnie.black said:

Assuming you also replaced the cables for cache1 please post new diags.

I am not home yet... I will replace them tomorrow and post the new diags in here

Thank you so much for your time and help

@johnnie.black

JorgeB · September 27, 2019

You'll need to back up cache pool and re-format, but recommend only doing that after replacing the cables.

livingonline8 · September 28, 2019

On 9/27/2019 at 2:23 PM, johnnie.black said:

You'll need to back up cache pool and re-format, but recommend only doing that after replacing the cables.

Ok that makes sense... but here is the problem now... after telling me yesterday that I need backup my cache pool and replace cables... I came home today and i wanted to check if the current cables are connected well and ok "just to make sure before I go ahead and replace them"

So I disconnected both SSD cache drive and connected them again... made sure that the other end of the cables are connected well to the motherboard... then started my unraid server to see if the problem went away or not... if not then the plan was to backup my cache pool and replace the cables...But, I came across this problem now which is... unraid took of one of the SSD cache drives and put it under unassigned devices and it says the size is 16 kb?!!

the other cache drive was still in its place and reflects the correct information in terms of size, temperature and even the name

Please have a look at the attached picture

JorgeB · September 28, 2019

2 minutes ago, livingonline8 said:

it says the size is 16 kb?!!

That would suggest the SSD failed, try it on a different SATA port to confirm.

livingonline8 · September 28, 2019

15 minutes ago, johnnie.black said:

That would suggest the SSD failed, try it on a different SATA port to confirm.

Yup... I just changed the SATA port is you suggested and it is still the same situation as shown in the picture above

What do I do now? I know that means the cache pool is no longer usable... but is there any way to get anything out of it!!

what about the second SSD... can i get anything out of it?? my mistake is that I made the both in a pool...as far as I know if one failed that means the data inside both of them is pretty much gone forever, is that correct?

livingonline8 · September 28, 2019

Ok so I will buy 2 new SSD's tomorrow but this time I want to set them up as raid 1 so I dont have to face this situation again

Can you please tell me how do I do that ?

JorgeB · September 29, 2019

Pool was raid1, so most data should still be available, though likely not all because of the filesystem corruption, start the pool with just the remaining device, if it doesn't mount see here for some recovery options.

livingonline8 · October 1, 2019

On 9/29/2019 at 12:22 PM, johnnie.black said:

Pool was raid1, so most data should still be available, though likely not all because of the filesystem corruption, start the pool with just the remaining device, if it doesn't mount see here for some recovery options.

unfortunately, they were both in raid zero... I dont think they are available anymore

JorgeB · October 5, 2019

On 10/1/2019 at 11:21 AM, livingonline8 said:

unfortunately, they were both in raid zero..

~~Pool was raid1~~

Correction, metadata was raid1 but data was single, you still might be able to recover some data with btrfs restore, but likely mostly incomplete/corrupt.

Edited October 5, 2019 by johnnie.black

Docker service failed to start + Libvirt service failed to start

Recommended Posts

livingonline8

Link to comment

JorgeB

Link to comment

livingonline8

Link to comment

saarg

Link to comment

livingonline8

Link to comment

livingonline8

Link to comment

JorgeB

Link to comment

livingonline8

Link to comment

JorgeB

Link to comment

livingonline8

Link to comment

JorgeB

Link to comment

livingonline8

Link to comment

livingonline8

Link to comment

JorgeB

Link to comment

livingonline8

Link to comment

JorgeB

Link to comment

Join the conversation