Jump to content

Cache Disk - VM Pausing - Check File System Problem


darrenyorston

Recommended Posts

As I have posted elsewhere my PFSense VM has been pausing for two days now. I resume it however it pauses again shortly after. Someone suggested it might be a cache disk issue. I checked and unRAID has reserved 200GB og my cache. I have 2 x 120GB and 2 x 520GB SSDs as cache. They were less than 50% full, something like 38%.

 

I shut down my server and adjusted a SATA power cable on one of the SSDs and now Fix Common Problems is reporting:

 

 

cache (KINGSTON_SHFS37A120G_50026B7261075EF7)has file system errors (No file system (32)) If the disk if XFS / REISERFS, stop the array, restart the Array in Maintenance mode, and run the file system checks. If the disk is BTRFS, then just run the file system checksIf the disk is listed as being unmountable, and it has data on it, whatever you do do not hit the format button. Seek assistance HERE

 

My SSD is BTRFS. I looked to run the File System Check however the option was grayed out with a message advising it was only available when the array was running in Maintenance mode.

 

I restarted the array in Maintenance mode, selected File System Check, but it doesn't appear to be doing anything. The text box says "running" but its been like that for over 24hrs now.

 

How long should it take to finish?

 

I have attached diagnostics again.

 

tower-diagnostics-20171104-1754.zip

Link to comment
13 hours ago, johnnie.black said:

btrfs fsck should only be run as a last resort, and it wouldn't take that long, it's probably hanged, see here to try and recover your data:

 

https://forums.lime-technology.com/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

 

Well I ran it because that's what Fix Common Problems said to do if the disk was unmountable.

 

Having looked at your link, could you provide some guidance? I dont utilise command line functionality so what's the starting point?

Link to comment

I have found the directories using MC.

 

What disk number will the /x drive be for the purposes of an MC copy?

 

I tried "cp -r /mnt/x /mnt/disk4" however nothing happens. When I look at disk 4 no files have been copied.

 

Ok. I have discovered that the command line doesnt work. I just utilised the menus instead.

Link to comment

I have attached the diagnostics file again.

 

The drive won't format, it keeps saying its "Unmountable" in the GUI. A different cache disk, one I have not touched is now showing as a "New Device"

 

This feels like a rat hole I have gone down. My initial problem was for a VM pausing, potentially due to a 200GB reserved space, I shut down the server and took the opportunity to adjust the path of a SATA cable, now I am in the situation that none of my Docker containers or VMs work.

 

What is the best way to get to a functional system?

 

tower-diagnostics-20171105-1139.zip

Link to comment

I have been able to format the disk, I had to remove it from the cache first.

 

Now I am trying to restore the files back to it however the cache disk (/x) is reporting "Read-only file system (30)"

 

unRAID Cache disk message is also reporting:

 

"Cache pool BTRFS too many profiles X" with X being the SSD I just re-added.

Link to comment

If you already copied everything important from you cache to the array or another place now it's best to completely format your cache pool to start over with a clean filesystem and then restore the data:

 

This will delete all data on the cache SSDs.

 

Stop the array and yype:

 

blkdiscard /dev/sdX

 

Replace X with each SSD identifier, one at a time.

 

Then start the array and format the pool.

 

Link to comment
3 minutes ago, darrenyorston said:

Why would I? The instructions I followed advised to copy the materials of one disk, not the entire cache.

 

When you mount a device from a pool in recovery mode and if the mount is successful the whole pool will be mounted, you need to copy everything you want from the pool so it can be reformatted, this process is to recover the data, not to fix the pool, the pool needs to be recreated.

Link to comment

Where is the docker image? If it was on the cache it's probably corrupt and needs to be recreated, if you have your appdata you recreate all docker form the previous templates to retain all config options.

 

Same for the VMs, look where libvirt.img was stored, restore from backup or recreate, if recreated you'll need to recreate your VMs, or restore the XMLs from backups, if you have them (or backups) you can re-use the old vdisks and not lose any data.

 

For the array shares we need to see the diagnostics, it can file filesystem corruption on the array.

 

 

Link to comment
41 minutes ago, darrenyorston said:

Well in that case something didnt work. I copied all the directories of the mount and it totalled only 60GB, it was 300GB or so prior to mounting the drive.

 

If the mount works all pool data should be available, except if like mentioned in the FAQ:

 

Quote

Note that if there are more devices missing than the profile permits for redundancy it may still mount but there will be some data missing, e.g., mounting a 4 device raid1 pool with 2 devices missing will result in missing data.

 

Edit: it's also possible you had multiple profiles on your cache and only one of them mounted, but you should get a warning about that if system notifications are enable.

Link to comment
31 minutes ago, darrenyorston said:

I followed your instructions here 

 

Yes, and where does it say to add the device back?

 

Like I said try the recovery mount again, but now using one of the other SSDs, if it doesn't mount with -o recovery,ro try the option below for when there's a missing device (-o degraded,recovery,ro)

 

Link to comment
9 hours ago, johnnie.black said:

 

Yes, and where does it say to add the device back?

 

Like I said try the recovery mount again, but now using one of the other SSDs, if it doesn't mount with -o recovery,ro try the option below for when there's a missing device (-o degraded,recovery,ro)

 

Here:

 

If it mounts copy all the data from /x to other destination, like an array disk, you can use Midnight Command or your favorite tool, after all data is copied format the disk and restore data.

 

The disc mounted and I was able to copy the data off the drive. The format of the disk though never worked. The GUi continued to report the disc required formatting. When I told it to format it would format for about 10sec or so then would show mountable again.

 

So I removed the disc from cache and utilised  unassigned devices to unmount it. I  then utilised pre-clear disks to prepare the disc. I then re added the drive to the cache and it is working fine, or so it seems.

 

I have had to delete the docker image and re add all the containers. The docker containers are not working correctly at the moment though as they dont seem to be able to see the internet.  I also had to delete the VMs and start anew.

 

I will probably look to move my data back onto Freenas now and just use unRAID for docker containers, Im concerned about unRAIDs stability. Removing and replacing a SATA cable shouldnt result in such problems.

 

 

diagnostics.zip

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...