Replaced cache drive -> SQUASHFS errors -> can't boot


smkings

Recommended Posts

Hi there,

 

Currently unable to boot into unRaid (in any mode) so my apologies for the lack of logs.

 

I previously had a 1 TB nvme drive as my cache drive and decided to pass this through to a VM and replace it with a 500 GB Samsung EVO ssd.

 

These were the steps I took:

- Changed the shares on the cache to "prefer" (appdata, domains, system) **EDIT** Cache:Prefer is incorrect. I used Cache:Yes, see below post from trurl

- Ran the mover to move everything to the array

- A few files were left behind which I moved manually with the unbalance plugin

- Removed the (now empty) nvme drive from the array config

- Added the new 500 GB EVO as cache

- Changed the shares back to Cache: Only and ran mover again **EDIT** This should be Cache:Prefer - see post below

- Array then seemed to be working fine. All good so far.

 

When I was setting up PopOS without a vdisk but passing through the nvme drive, something froze up the entire server, forcing me to do a hard reboot.

 

When the server tried to boot up again (regular, safe, safe gui, etc.) I get a sequence of errors which I **think** are due to being unable to find the system files on my cache drive.

 

Sure enough - in my DISK_ASSIGNMENTS.txt - my old nvme drive is listed as cache, so unRaid is looking on a drive that now has PopOS on it.

 

My question is:

 

What is the safest way to resolve this please?

 

I am able to boot in safe mode with a different USB key created from scratch. Should I re-assign the array drives correctly and then take the super.dat and DISK_ASSIGMENTS.txt file from that and move them to the normal USB key with my licence on it? I tried removing super.dat from my existing usb key and booting but I get the same errors.

 

Thank you so much!

 

 

 

20201207_003539.jpg

Edited by smkings
Link to comment

Thanks JorgeB. I created a new flash drive, restored the config folder to that and it hits the same issues.

 

If I use the new flash drive without restoring the config folder then I am able to boot in safe mode no plugins. So it looks like it could be something in the config folder?

Link to comment

When I boot with a vanilla config on a new USB, adding only my super.dat and DISK_ASSIGNMENTS.txt - it produces the same SQUASHFS errors. So I am guessing it is something in those 2 files.

 

Anyway I have now booted into safe mode from a new USB with a clean configuration. Should I just re-assign the drives in the array? I am getting: "All existing data on this device will be OVERWRITTEN when array is started" against the parity drive" - is that normal? 

 

Really don't want to kill my array here :)

Link to comment
14 minutes ago, smkings said:

When I boot with a vanilla config on a new USB, adding only my super.dat and DISK_ASSIGNMENTS.txt - it produces the same SQUASHFS errors. So I am guessing it is something in those 2 files.

Very unlikely, but you can just re-assign the disks, just amke sure all assignments are correct and check "parity is already valid" before array start.

Link to comment
13 hours ago, smkings said:

Changed the shares on the cache to "prefer" (appdata, domains, system)

- Ran the mover to move everything to the array

Cache-prefer shares get moved FROM the array TO cache. Those shares should normally be prefer so they try to stay on cache. To get them moved FROM cache TO the array so you can change cache drive you would have to set them to cache-yes.

13 hours ago, smkings said:

Changed the shares back to Cache: Only and ran mover again

Mover ignores cache-only and cache-no shares.

Link to comment
2 hours ago, trurl said:

Cache-prefer shares get moved FROM the array TO cache. Those shares should normally be prefer so they try to stay on cache. To get them moved FROM cache TO the array so you can change cache drive you would have to set them to cache-yes.

Mover ignores cache-only and cache-no shares.

@trurlthanks, I always get confused with the terminology and without being able to see it in front of me, I mixed them up, but you are exactly right. I was going by the (very helpful) "Mover transfers files from cache to array" text, so I did in fact use Cache: Yes to move them. I'll add an edit to my original post for anyone else coming across this!

 

    

2 hours ago, JorgeB said:

Very unlikely, but you can just re-assign the disks, just amke sure all assignments are correct and check "parity is already valid" before array start.

@JorgeB Thank you - I re-assigned the disks according to my DISK_ASSIGNMENTS.txt from the old USB, ticking the "parity is already valid" box - thank you for the reminder!

 

The array is now back up and running (on the new USB). I haven't run parity yet but the files look good in my shares.

 

I now have a few other small problems as I'm running from a vanilla USB:

 

1. Share preferences did not update - had to manually re-assign appdata, domains and system to Cache: Prefer (thanks @trurl) - I've fixed this but just for the benefit of anyone else coming across it)

 

2. Docker containers have no thumbnail images and don't seem to be running:

 

1765368178_Screenshot2020-12-07at16_48_40.png.5baeab8f7d8f7d6604bd441d6a466d30.png

 

I looked in my docker.cfg and found that a bunch of lines were missing including DOCKER_APP_CONFIG_PATH. I stopped docker, clicked into the selector for config path and pointed to my appdata share again (even though the path looked correct). Saved. Rebooted. Started Docker again. That resulted in the below - new docker.cfg on the left, old/original docker.cfg on the right.

 

However even with the cfg seemingly correct it isn't finding the docker configuration (see above screenshot). 

96161796_Screenshot2020-12-07at16_48_50.thumb.png.4a9a780332e162e80eae4e3b5068a8c9.png

 

I have checked in the appdata files and they all look intact, the share is located on the cache drive and set to cache:prefer. Am I missing something? Do I need to restore my docker.img file or something else?

 

Thanks as ever for your help.

 

Link to comment

Fixed! For reference for anyone else encountering this:

 

I had to re-install all my plugins - starting with CA and then Fix Common Problems. The latter pointed me to another missing file.

 

After digging a bit in here: https://wiki.unraid.net/Files_on_v6_boot_drive it turns out the configuration files docker was looking for are stored on the USB boot drive in config/plugins/dockerMan/templates-user as a bunch of .xml files. This folder was empty on my new USB.

 

I copied the files across, rebooted, and everything works.

 

I still need to manually install a bunch of plugins but that's fine!

 

Kudos to the unRaid team for building such a robust system and thanks @JorgeB and @trurl for your help! 

 

I guess I will write to the kind people at unRaid now to see if they can switch over my serial to the new USB before the trial runs out :)

 

Cheers

  • Like 1
Link to comment

**UPDATE**

 

Just ran into the same problem as before on the new USB so I am thinking it is not something to do with the USB now?

 

20201209_101514.thumb.jpg.b80893dd59729870c74d9179b066545f.jpg

 

I had everything back up and running on the new USB. Dockers fine after restoring the .xml - installed minimal plugins (CA, Fix Common Problems).

 

The one thing I did do was change my PCIE-ACS override to "Both" ("downstream" did not give me correct IOMMU groups) and then passthrough my graphics card (Radeon 470), USB controller, and NVMe controller.

 

The first two I have been successfully passing through for a while now. The NVMe controller is the new one that seems to be the problem.

 

Sure enough, after running in Pop!OS on the passed through NVMe for a while, my POP!Os froze up (same as before). I tried to reboot the VM (running with no vDisk but just the passed through NVMe) - the reboot failed and the VM disappeared. I rebooted the server, and the SQUASHFS problems returned.

 

I backed up the USB and did a clean install (on the same USB) - restoring only the network.cfg and Basic.key files to the config folder. After re-assigning my drives, the array is back up and running a parity check (17% - no errors so far).

 

Is this somehow being caused by passing through the NVMe controller? If so, how?

Edited by smkings
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.