Jump to content

No VMs or Dockers after disk replacement


Recommended Posts

Morning,

 

I've had a few frantic days with my Unraid box, timeline below: -

 

  1. Friday 6th - Parity disk failed late in the evening, can't recall seeing a reason but I went ahead and ordered a replacement from Amazon. At this point dockers and VMs were still running and continued to be used until.....
  2. Sunday 8th - Swapped out failed parity with replacement. Parity sync/rebuild started but at a very slow speed (time to complete c145 days on a 8TB drive). Some googling of these symptoms suggested re-sitting/replacing SATA cables, only having array drives connected. A combination of these solutions kick started the speed and the repair was estimated to complete within 12 hours or so. Prior to my last reboot and re-starting of the parity repair, I set all my dockers/VMs to not auto start 
  3. After the repair got underway for the final time and with healthy speed I then noticed the next problem which was one of the previous healthy array disks now stated it was unmountable. As I could still access shares remotely, I decided to leave the parity repair to complete and see what happened. I could see the unmountable disk was still being read from during the repair
  4. After the parity repair completed, the afore mentioned unmountable disk was still showing unmountable even after a reboot. Some more googling and I undertook the suggested fix in this post as symptoms felt similar. This addressed the problem and previously unmountable drive was back and all green.
  5. And now my current problem, no dockers or VMs can be found, both GUI tabs are completely empty. I can still see all my appdata folder, VM disks and docker image files in the file system, but all the tabs in the GUI are empty suggesting there are no dockers or VMs

 

Sorry for the long winded post but wanted to give as much detail as possible. I have also attached latest diagnostics as well. Any help or advice would be appreciated.

 

Regards

 

nwr122

 

yavin-diagnostics-20210809-0919.zip

Edited by nwr122
  • Like 1
Link to comment

It seems like you may have done a lot of the right things, though maybe if you had asked first some of them wouldn't have been necessary.

 

7 hours ago, nwr122 said:

Friday 6th - Parity disk failed late in the evening, can't recall seeing a reason

If you had posted Diagnostics before doing anything, we could have looked at SMART (all attached disks included in Diagnostics) for the disk to see if it was a disk problem, or syslog (also included in Diagnostics) to see if it was a connection problem. Connection problems are much more common than bad disks. Probably no reason to replace.

 

7 hours ago, nwr122 said:

Sunday 8th - Swapped out failed parity with replacement. Parity sync/rebuild started but at a very slow speed

This also suggests connection problems. You must be very careful when mucking about inside your server, and always double-check all connections.

 

7 hours ago, nwr122 said:

only having array drives connected. ... Prior to my last reboot and re-starting of the parity repair, I set all my dockers/VMs to not auto start 

Even with no dockers/VMs started, if you had Docker and/or VM Manager enabled when you had no pool disk, then they would have started over by using the array.

 

7 hours ago, nwr122 said:

one of the previous healthy array disks now stated it was unmountable. As I could still access shares remotely, I decided to leave the parity repair to complete and see what happened. I could see the unmountable disk was still being read from during the repair

Shares exist on multiple disks, and even with an unmountable disk, any shares on other disks would be accessible. Instead of saying "parity repair" you should say "parity rebuild". "Repair" is usually used when discussing repairing corrupt/unmountable filesystems. Of course the unmountable disk would be read during parity rebuild, since parity doesn't know anything about filesystems and uses all assigned disks.

 

7 hours ago, nwr122 said:

After the parity repair completed, the afore mentioned unmountable disk was still showing unmountable even after a reboot.

Parity rebuild can't do anything to fix unmountable filesytems, and neither can reboot.

 

7 hours ago, nwr122 said:

addressed the problem and previously unmountable drive was back

There was a lot in that thread and probably some of it didn't apply to you, but you don't have a lost+found share so maybe everything was recovered by the filesystem repair. Only you can say if anything is missing.

 

7 hours ago, nwr122 said:

Some googling ... Some googling ... Some more googling ...

Probably everything you needed was in the wiki which you can access by clicking the "manual" link in the lower right corner of the webUI on your server, or using the Docs link at top or Documentation link at bottom of any forum page.

 

https://wiki.unraid.net/Manual/Storage_Management#What_is_a_.27failed.27_.28disabled.29_drive

https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

https://wiki.unraid.net/Manual/Storage_Management#Drive_shows_as_unmountable

https://wiki.unraid.net/Manual/Storage_Management#Running_the_Test_using_the_webGui

https://wiki.unraid.net/Manual/Storage_Management#Repairing_a_File_System

 

 

7 hours ago, nwr122 said:

no dockers or VMs can be found, both GUI tabs are completely empty.

Your system share, where docker.img and libvirt.img is stored, now exists on both disk1 and cache. You should probably get rid of them and recreate or restore.

 

You can recreate docker.img and easily reinstall your dockers exactly as they were.

https://wiki.unraid.net/Manual/Troubleshooting#How_do_I_recreate_docker.img.3F

https://wiki.unraid.net/Manual/Troubleshooting#Restoring_your_Docker_Applications

 

I've not needed to recover libvirt but I think CA Backup plugin covers that.

 

 

 

  • Like 1
Link to comment

Hi Trurl,

 

Thanks for the detailed response, in general I get the point about asking before action, lesson learnt. Same goes for using Wiki and terminology.

 

Just a couple of follow up: -

 

1 hour ago, trurl said:

If you had posted Diagnostics before doing anything....

I took a diagnostics on the Friday as soon as I realised the drive had failed. I've attached it to this post. The drive in question, 1EHVP1BZ, shows as having passed SMART which at the time confused me but I did not think to question it. Having plugged the drive into the WD caddy and plugged into my laptop, it shows all green in CrystalDiskInfo so I am questioning whether it needs to be returned.  I can see the syslog files as well but I am not ashamed to admit I'm not really sure what I should be looking for in here but happy to be educated/learn.

 

1 hour ago, trurl said:

Your system share, where docker.img and libvirt.img is stored, now exists on both disk1 and cache.

Ok, thanks, although my docker.img is currently only on my cache and not on disk1

 

image.png.60230e2a533167fea9ef320d5cb3edd5.png

 

I'll take a look at those guides and have a go at restoring my dockers and Vm.

 

Many thanks again for your reply.

 

 

 

yavin-diagnostics-20210806-2202.zip

  • Like 1
Link to comment
1 hour ago, nwr122 said:

The drive in question, 1EHVP1BZ, shows as having passed SMART which at the time confused me but I did not think to question it.

Failed disk is a very imprecise idea as explained at that first link I gave.

 

SMART attributes for that disk looks fine, and that is what CrystalDiskInfo is reporting on. These SMART attributes are recorded by the disk in its firmware as the disk is used. No SMART tests have been run on the disk, which would actually test the disk more than just what happens during use. In the webUI, you can click on a disk to get to its page to run short and extended tests. Probably the disk would pass. You could also run WD diagnostics (free download from WD) and use it to test the disk.

 

Since that was a parity disk, it has no filesystem to mount and contains no useful data by itself. Parity is just an extra bit that allows a missing bit to be calculated from all the other bits, so parity disk plus all other disks can recover the data for a missing data disk.

 

Lots of this and similar in syslog which is typical of connection problem

Aug  5 01:39:25 YAVIN kernel: ata3: link is slow to respond, please be patient (ready=0)
Aug  5 01:39:29 YAVIN kernel: ata3: COMRESET failed (errno=-16)
Aug  5 01:39:29 YAVIN kernel: ata3: hard resetting link

Can't say for sure which disk that is referring to since syslog had rotated past the point where I could see the initial connection that refers to, but other connections weren't complaining that I noticed so almost certainly it was that parity disk.

 

User Shares prioritizes the lowest numbered disk, and cache is lower than any of the array disks, so the libvirt on cache is the one being used, and it looks newer than the other one so that might explain why your VMs are missing. I noticed you had CA Backup plugin installed so maybe you had that setup to make a backup of libvirt.

 

As you noted, there is only one docker.img, that on cache, but it looks new also. The method in the wiki I linked to reinstall your dockers will probably be all that's needed to get those going again.

 

You should delete the system folder from disk1.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...