Cache Disk - VM Pausing - Check File System Problem - General Support

November 4, 20178 yr

As I have posted elsewhere my PFSense VM has been pausing for two days now. I resume it however it pauses again shortly after. Someone suggested it might be a cache disk issue. I checked and unRAID has reserved 200GB og my cache. I have 2 x 120GB and 2 x 520GB SSDs as cache. They were less than 50% full, something like 38%.

I shut down my server and adjusted a SATA power cable on one of the SSDs and now Fix Common Problems is reporting:

cache (KINGSTON_SHFS37A120G_50026B7261075EF7)has file system errors (No file system (32))

If the disk if XFS / REISERFS, stop the array, restart the Array in Maintenance mode, and run the file system checks. If the disk is BTRFS, then just run the file system checksIf the disk is listed as being unmountable, and it has data on it, whatever you do do not hit the format button. Seek assistance HERE

My SSD is BTRFS. I looked to run the File System Check however the option was grayed out with a message advising it was only available when the array was running in Maintenance mode.

I restarted the array in Maintenance mode, selected File System Check, but it doesn't appear to be doing anything. The text box says "running" but its been like that for over 24hrs now.

How long should it take to finish?

I have attached diagnostics again.

tower-diagnostics-20171104-1754.zip

Quote

November 4, 20178 yr

btrfs fsck should only be run as a last resort, and it wouldn't take that long, it's probably hanged, see here to try and recover your data:

https://forums.lime-technology.com/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

Quote

November 4, 20178 yr

2 hours ago, johnnie.black said:

btrfs fsck should only be run as a last resort, and it wouldn't take that long, it's probably hanged, see here to try and recover your data:

https://forums.lime-technology.com/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

Updated FCP to reflect this

Quote

November 4, 20178 yr

Author

13 hours ago, johnnie.black said:

btrfs fsck should only be run as a last resort, and it wouldn't take that long, it's probably hanged, see here to try and recover your data:

https://forums.lime-technology.com/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

Well I ran it because that's what Fix Common Problems said to do if the disk was unmountable.

Having looked at your link, could you provide some guidance? I dont utilise command line functionality so what's the starting point?

Edited November 4, 20178 yr by darrenyorston

Quote

November 4, 20178 yr

7 minutes ago, darrenyorston said:

I dont utilise command line functionality so what's the starting point?

SSH into the server and start on option 1, all steps are there, ask if you have a doubt on a specific step so I can improve it.

Quote

November 4, 20178 yr

Author

You say to replace x with the actual device.

So its going to be "mkdir /sdi1"? or mkdir /dev/sdi1"?

Leading to "mount -o recovery,ro /dev/sdX1 /dev/sdi1"?

Quote

November 5, 20178 yr

Author

I have presumed it's "mkdir /x" leading to "mount -o recovery,ro /devsdi1 /x".

No errors reported and I am at the command prompt. where do I find the mounted disk to copy the files from?

Nothing appears to have changed on the GUI.

Quote

November 5, 20178 yr

You mounted the disk onto the directory /x, so the files will be in the directory /x.

Quote

November 5, 20178 yr

Author

I have found the directories using MC.

What disk number will the /x drive be for the purposes of an MC copy?

I tried "cp -r /mnt/x /mnt/disk4" however nothing happens. When I look at disk 4 no files have been copied.

Ok. I have discovered that the command line doesnt work. I just utilised the menus instead.

Edited November 5, 20178 yr by darrenyorston
Updated

Quote

November 5, 20178 yr

Author

I have been able to copy all the material off the drive.

I formated the disk as per

however the disk still shows that it is unmountable.

Quote

November 5, 20178 yr

Author

I have attached the diagnostics file again.

The drive won't format, it keeps saying its "Unmountable" in the GUI. A different cache disk, one I have not touched is now showing as a "New Device"

This feels like a rat hole I have gone down. My initial problem was for a VM pausing, potentially due to a 200GB reserved space, I shut down the server and took the opportunity to adjust the path of a SATA cable, now I am in the situation that none of my Docker containers or VMs work.

What is the best way to get to a functional system?

tower-diagnostics-20171105-1139.zip

Quote

November 5, 20178 yr

Author

I have been able to format the disk, I had to remove it from the cache first.

Now I am trying to restore the files back to it however the cache disk (/x) is reporting "Read-only file system (30)"

unRAID Cache disk message is also reporting:

"Cache pool BTRFS too many profiles X" with X being the SSD I just re-added.

Edited November 5, 20178 yr by darrenyorston

Quote

November 5, 20178 yr

If you already copied everything important from you cache to the array or another place now it's best to completely format your cache pool to start over with a clean filesystem and then restore the data:

This will delete all data on the cache SSDs.

Stop the array and yype:

blkdiscard /dev/sdX

Replace X with each SSD identifier, one at a time.

Then start the array and format the pool.

Quote

November 5, 20178 yr

Author

No I have not copied everything of the cache. Why would I? The instructions I followed advised to copy the materials of one disk, not the entire cache.

I am not feeling overly confident here. Seems the "fix" is worse than the problem.

Atleast I had access to my files and docker. Now I have neither.

Quote

November 5, 20178 yr

3 minutes ago, darrenyorston said:

Why would I? The instructions I followed advised to copy the materials of one disk, not the entire cache.

When you mount a device from a pool in recovery mode and if the mount is successful the whole pool will be mounted, you need to copy everything you want from the pool so it can be reformatted, this process is to recover the data, not to fix the pool, the pool needs to be recreated.

Quote

November 5, 20178 yr

Author

Well in that case something didnt work. I copied all the directories of the mount and it totalled only 60GB, it was 300GB or so prior to mounting the drive.

At the moment none of my docker containers work, the VMs tab is blank, and I cannot access shares on the array.

Quote

November 5, 20178 yr

Where is the docker image? If it was on the cache it's probably corrupt and needs to be recreated, if you have your appdata you recreate all docker form the previous templates to retain all config options.

Same for the VMs, look where libvirt.img was stored, restore from backup or recreate, if recreated you'll need to recreate your VMs, or restore the XMLs from backups, if you have them (or backups) you can re-use the old vdisks and not lose any data.

For the array shares we need to see the diagnostics, it can file filesystem corruption on the array.

Quote

November 5, 20178 yr

Author

When I browse the cahce folder from the gui the appdata and other folders are there. They are empty however.

There is no libvirt.img. The only *.img file is docker.img.

I have already uploaded the diagnositcs file.

Quote

November 5, 20178 yr

I'm not understanding if currently you have a mounted cache or not, I need to see current diagnostics.

Quote

November 5, 20178 yr

41 minutes ago, darrenyorston said:

Well in that case something didnt work. I copied all the directories of the mount and it totalled only 60GB, it was 300GB or so prior to mounting the drive.

If the mount works all pool data should be available, except if like mentioned in the FAQ:

Quote

Note that if there are more devices missing than the profile permits for redundancy it may still mount but there will be some data missing, e.g., mounting a 4 device raid1 pool with 2 devices missing will result in missing data.

Edit: it's also possible you had multiple profiles on your cache and only one of them mounted, but you should get a warning about that if system notifications are enable.

Edited November 5, 20178 yr by johnnie.black

Quote

November 5, 20178 yr

Author

I posted that earlier:

its reporting:

"Cache pool BTRFS too many profiles X" with X being the SSD I just re-added.

Quote

November 5, 20178 yr

You shouldn't have tried to add it back but try mounting the pool using one of the other SSDs and check if you can access different data.

Quote

November 5, 20178 yr

Author

I followed your instructions here

Quote

November 5, 20178 yr

31 minutes ago, darrenyorston said:

I followed your instructions here

Yes, and where does it say to add the device back?

Like I said try the recovery mount again, but now using one of the other SSDs, if it doesn't mount with -o recovery,ro try the option below for when there's a missing device (-o degraded,recovery,ro)

Quote

November 5, 20178 yr

Author

9 hours ago, johnnie.black said:

Yes, and where does it say to add the device back?

Like I said try the recovery mount again, but now using one of the other SSDs, if it doesn't mount with -o recovery,ro try the option below for when there's a missing device (-o degraded,recovery,ro)

Here:

If it mounts copy all the data from /x to other destination, like an array disk, you can use Midnight Command or your favorite tool, after all data is copied format the disk and restore data.

The disc mounted and I was able to copy the data off the drive. The format of the disk though never worked. The GUi continued to report the disc required formatting. When I told it to format it would format for about 10sec or so then would show mountable again.

So I removed the disc from cache and utilised unassigned devices to unmount it. I then utilised pre-clear disks to prepare the disc. I then re added the drive to the cache and it is working fine, or so it seems.

I have had to delete the docker image and re add all the containers. The docker containers are not working correctly at the moment though as they dont seem to be able to see the internet. I also had to delete the VMs and start anew.

I will probably look to move my data back onto Freenas now and just use unRAID for docker containers, Im concerned about unRAIDs stability. Removing and replacing a SATA cable shouldnt result in such problems.

diagnostics.zip

Quote

Cache Disk - VM Pausing - Check File System Problem

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)