Libvirt Service failed to start in 6.8.3 after reboot


AntoineR
Go to solution Solved by JorgeB,

Recommended Posts

Nov 19 08:43:33 Pegasus kernel: BTRFS info (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 48069127, rd 10259266, flush 1479344, corrupt 0, gen 0

 

This shows that one of the cache devices dropped offline in the past, run a correcting scrub and post the results.

 

P.S. 6.8.3 is pretty old, recommend updating to latest stable.

Link to comment

Thanks for your answer and your time!

 

I indeed had a cache drive disconnect, I added it back to the pool a while back, however the diagnostic line you quote is not from the same day so that's interesting to note. How can I run a correcting scrub? For what it's worth, running a parity check yields this :

 

Parity check finished (0 errors)
Duration: 12 hours, 35 minutes, 1 second. Average speed: 132.5 MB/s

 

I stayed on 6.8.3 because I use an NVIDIA GPU for some transcoding on a plex docker container, and from my understanding, newer versions do not allow this due to the disconnect between unRAID and NVIDIA, am I mistaken?

 

I added a new diagnostics I ran after the parity check if that can help!

 

Again, thanks immensely for your time and help, I'm in over my head and it's really appreciated! :)

 

pegasus-diagnostics-20231121-1351.zip

Link to comment
27 minutes ago, Antoine Rincent said:

How can I run a correcting scrub?

Click on the first pool device and scroll down to the scrub section.

 

27 minutes ago, Antoine Rincent said:

I stayed on 6.8.3 because I use an NVIDIA GPU for some transcoding on a plex docker container, and from my understanding, newer versions do not allow this due to the disconnect between unRAID and NVIDIA, am I mistaken?

Don't use it but pretty sure you can still do that, see here:

 

 

 

  • Like 1
Link to comment

Thanks again for your time, I read the thread and will inform myself more to update in the future!

 

On the matter at hand, I ran a scrub (without checking the "repair corrupted blocks" checkbox) on the first device of my cache pool, and these are the results :

 

UUID: 2cdcedd8-7db1-4a5b-ac33-b942268ed85c Scrub started: Tue Nov 21 18:18:49 2023 Status: finished Duration: 0:26:05 Total to scrub: 584.82GiB Rate: 382.65MiB/s Error summary: verify=9957 csum=57513363 Corrected: 0 Uncorrectable: 0 Unverified: 0

 

As usual I attached a new file diagnostics if this can have any helpfulness as well.

 

If this can help, the moment the drive was disconnected and reconnected to the cache pool was a while back, and VMs worked between then and this error showing up. Could it be that the issue took a while to show up because it wasn't properly corrected on time?

 

Thanks once more and I hope you have a wonderful day :)

pegasus-diagnostics-20231121-1851.zip

Link to comment

Ok sorry! I reran it with the checkbox and it corrected everything apparently, reran it again and it resulted in no errors. I have since rebooted the unRAID machine, reran the check, which gives this result :

 

UUID: 2cdcedd8-7db1-4a5b-ac33-b942268ed85c Scrub started: Wed Nov 22 10:40:21 2023 Status: finished Duration: 0:11:14 Total to scrub: 584.17GiB Rate: 887.50MiB/s Error summary: no errors found

 

However when going into the VM tab, I still get the error message about the libvirt service failing to start. What else should I try? I posted new diagnostics.

 

Again, thanks immensely for your time and effort, which are deeply appreciated! :)

pegasus-diagnostics-20231122-1051.zip

Link to comment
  • Solution
Nov 22 10:39:59 Pegasus emhttpd: shcmd (2834): /usr/local/sbin/mount_image '/mnt/user/system/libvirt/libvirt.img' /etc/libvirt 1
Nov 22 10:39:59 Pegasus root: mount: /etc/libvirt: wrong fs type, bad option, bad superblock on /dev/loop3, missing codepage or helper program, or other error.

 

Libvirt file is corrupt, you'll need to restore from a backup, if there's no backup you can create a new image and recreate all the VMs with the original settings and pointing to the existing vdisks, though some may not work.

Link to comment

That's unfortunate. I didn't know this file was so important (nor existed, frankly), so I never backed it up. Is there a chance that the machine backed it up automatically, and if so, where would it be? Otherwise I'll try to point to the vdisks and hope for the best. Should it fail, does it mean I would have to create the VMs from scratch again as they would be unusable due to the VM manager not being able to "manage" them?

 

To avoid this issue happening in the future, is there any way to have an indication of what caused the libvirt file to be corrupted? And is there a clean method to set it to backup occasionnally?

 

Again, huge thanks!

Link to comment
1 minute ago, Antoine Rincent said:

Is there a chance that the machine backed it up automatically

Nope, appdata plugin will backup that file, but you'd need to installed it.

 

2 minutes ago, Antoine Rincent said:

Should it fail,

Most, if not all, should work if you sue the same settings.

 

2 minutes ago, Antoine Rincent said:

To avoid this issue happening in the future, is there any way to have an indication of what caused the libvirt file to be corrupted? And is there a clean method to set it to backup occasionnally?

One of the devices dropping in an old Unraid release can be enough, if the system shares were set to NOCOW, see here for more info.

 

 

 

 

Link to comment

Thanks for your help once more. I only now got to get to it as I wanted to have the time to properly sit down to try it out and document my efforts. I marked the libvirt file being corrupted as the solution as it was what happened visibly. If anybody very noobish like me falls upon this thread in the future, here are the steps I went through to repair my VMs :

 

1. into settings, VM manager I deleted the libvirt file, didn't change any of its settings.

2. Reboot the unRAID machine

3. Upon reboot, noticed the VM service was successful at starting, I went into VMs to create new ones

4. Selected the right operating system and settings, changed the VM's name to the same name it used to have. I don't know if CPU pinning matters but I know I used the same I used in the past, so YMMV, same for Memory.

5. For the primary Vdisk location, I pointed to the previous Vdisk location, if it is default, you can find it in the same page you deleted the libvirt file under default VM storage path, remember to point to the .img file not just the specific VM's folder. I left the vdisk size textbox empty to avoid screwing with the vdisk that was already there and it worked perfectly.

 

This was only tested on Ubuntu VM's, so once again, your mileage may vary, but this fixed the issue for me, and I hope it will help anyone who finds this thread in the future.

 

Finally,

On 11/22/2023 at 2:06 PM, JorgeB said:

Nope, appdata plugin will backup that file, but you'd need to installed it.

I fail to find it online, could you link to the plugin, ideally in a 6.8.3 compatible version so I could use it until I get to upgrading when I'll have enough free time to ensure I do it cleanly?

 

Finally, huge thanks for helping noobs like me manage their issues, I'm certain it'll help more than just me, and I'm deeply grateful for your contributions to the unRAID community.

Have a good one, and good wishes none of your system files corrupt in the future because that sucks!

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.