System Locks Up on Boot Up


Recommended Posts

Hi All,

 

I was away for the weekend and came home to a locked up GUI, VM and Dockers.  I was able to putty in so I did and then initiated a reboot.  This went well for about 30 seconds and then everything locked up, including the putty session.  I hard rebooted and was able to putty back in and get a syslog, there are errors in it but I don't know what they mean.  I can't get the GUI to load either.  I have rebooted in safemode but still get errors and still can't get the GUI to load. 

 

Any help?  I have attached the only syslog I could grab.

syslog.txt

Link to comment
11 hours ago, Squid said:

Your cache drive has corruption on it (side note: if you're never planning on running a cache-pool, consider switching the format of it over to XFS as its more forgiving)

 

You need to Check Disk Filesystem on the cache drive

 

Thanks Squid for the advice.  I am having problems running the tools though as I am getting errors saying no valid btrfs found.  I will play around with it some more.  I went with btrfs because I do have a cache pool.  I have two drives mirrored to prevent this type of issue from happening.  Would both my drives be pooched?

Link to comment
5 minutes ago, comfox said:

 

Thanks Squid for the advice.  I am having problems running the tools though as I am getting errors saying no valid btrfs found.  I will play around with it some more.  I went with btrfs because I do have a cache pool.  I have two drives mirrored to prevent this type of issue from happening.  Would both my drives be pooched?

You should wait until @johnnie.black pipes in if you're having errors returned...

Link to comment
3 hours ago, johnnie.black said:

With btrfs it's better to backup data and reformat when there are serious filesystem issues, if it's unmountable see here:

 

 

 

So I have run the command btrfs restore -v /dev/sdX1 /mnt/disk2/restore and it restored a lot of items but not the items that I really needed.  Can I have this command "skip" certain folders and restores others?

Link to comment

OK, I am rerunning the restore using -i right now to ignore errors and it has moved past the last point.  Will see how far this one gets.

 

This really burns my goat that I have mirroring on and I still have to resort to a manual recovery.  What is the point of having a cache pool mirror if is doesn't recover in a situation like this?

Link to comment

Mirror doesn't help with file system corruption, it helps if a disk fails.

 

Also these errors on btrfs usually have an underlying hardware problem, like timeout issues most times due to a bad cable in one of the mirrors, or unclean shutdowns.

 

If there were no unclean shutdowns you need to monitor the syslog after a week or of uptime and look for any problems.

Link to comment
38 minutes ago, johnnie.black said:

Mirror doesn't help with file system corruption, it helps if a disk fails.

 

Also these errors on btrfs usually have an underlying hardware problem, like timeout issues most times due to a bad cable in one of the mirrors, or unclean shutdowns.

 

If there were no unclean shutdowns you need to monitor the syslog after a week or of uptime and look for any problems.

Thanks, I wasn't aware of that...I didn't realize it would mirror the corruption.  I don't do unclean shutdowns ever so, if I get this fixed I will replace the cables and see if that helps longer term

 

28 minutes ago, limetech said:

comfox-

What is the history of this pool?  Meaning, do you recall which unRAID OS version you were running when you first created the pool?

Also, in your email to me you say this is the third "catastrophic" failure - please elaborate on that here.

 

This pool was created when I built the system.  I bought the key for this system on 12/15/2015 so that is when I would have most likely built this pool.  I used 6.x for the pool, would have been the most current 6.x version available at the time, maybe you can check your release history to see which version was available then.  I then upgraded through each new version that was released.

 

The other catastrophic failures have been other file loss situations.  The first was a loss of 3TB (2/21/15) of data from a drive becoming unrecognized by unRAID and the next was again a drive becoming unmountable (1/4/2017). 

 

 

So the restore is still going...it it currently on my Windows VM file which is a big file, it keeps updating the screen with "offset is XXXXX" where is XXXXX is an ever expanding number so I hope it is still recovering it.  This is the main file I need, everything else can be lost but without this vm file I am screwed.

Edited by comfox
Link to comment
3 hours ago, johnnie.black said:

Did you try the other mirror?

 

I am currently copying the img from the one server to a usb to see if the files I need from the VM are recoverable from it. 

 

I also setup another unRAID server and attached the second mirror and that server is currently working through the recovery command.  unRAID still would not mount it sadly but the recovery is running, albeit with more errors than the other drive.

Link to comment

So there is a switch in the btrfs restore command to isolate restoring only to a particular "regex".  Would anyone know how to construct a regular expression for VM_HDD_Lib so that I could isolate the restores to only my image folder?

 

 

Link to comment

I thought I should be able to change the file system by stopping the array and clicking on the drive and then change the file system through the drop down, however there is no drop down present on the drive screen.  Am I doing something wrong?

Link to comment
Just now, johnnie.black said:

 

You are. For the pool change by clicking on cache1 only.

 

Thanks for the response.  I don't see another place to click.  I am changing from 2 drives to 1 so I only have one drive to click on.  

 

 

2017-05-16_10-39-20.jpg

Link to comment

Consider this topic closed.  Here is a rundown for anyone that stumbles across this in the future.

 

NOTE: btrfs is still in its infancy and corruption can occur.  Recovery tools are not easy to work with and don't do a very good job recovering.  Use btrfs at your own risk and backup anything of value daily, if not hourly.

 

For some reason on the weekend my Cache Pool (btrfs) decided to head to the crapper.  I do not know why it did, there were no unexpected shutdowns and I didn't really look hard in to the why.  I just assume that I am a heavy user and corruptions happened.  On this cache drive was all my appdata for 7 dockers as well as the .img vHDD for my Windows 10 VM.  I did have the CA Backup / Restore Appdata Plugin but for some reason it wasn't working correctly so my data was not being backed up.

 

When I got home I tried to boot up my VM, which had crashed and it wasn't working.  The VM was hanging and then it would lock up the unRAID box.  After posting my log file it was apparent that my cache pool btrfs had become corrupt.

 

I followed @johnnie.black advice and started running some of the commands from him post

Of course the easy method didn't work so I ended up running the btrfs restore command to copy as much data off the drives as I could.  I got most of it back, however my .img vHDD was corrupt and wouldn't mount.  Since I had most of the image (205GB out of 215GB) I started to investigate a way to get in to the image and manually extract the data from within it.  In my searching I came across an application called DM Disk Editor (www.dmde.com) which claimed to be able to open up raw image files and recover the contents.  I used the free version first and validated that it could indeed recover some of the data.  I then bought the full version and proceeded to recover as much of the data from the vHDD as I could.  I recovered probably 90% of what I needed and it was a relatively easy process, though quite time consuming.

 

I have now put a new cache drive in to unRAID, changed the file system to xfs and started restoring all the services, apps and VM's.  I have copied all the recovered appdata over to my docker shares and restored the dockers through the docker tab.  I am now starting the process of reinstalling Windows and reloading my VM with the recovered data.

 

This has taken way to long to recover my life and I will be looking at better and easier backup solutions going forward.

 

Hope this helps someone in the future.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.