Unclean shutdown, now unable to mount disks


Recommended Posts

I've been having trouble with a MakeMKV docker stalling out and requiring a hard (hold power button) shutdown in order to get the docker back working again.  It's been intermittent but happened two days ago.  Everything else on the server runs fine.

 

Last night I performed hard shutdown and restart but it took much longer than usual to come back up (45 min to access web GUI).  After entering the array encryption key to start the array it appears to hang and will not mount disks.

 

I did this routine last night into safe mode and mounted in maintenance mode.  However, upon trying to perform a clean reboot back into "normal" mode it hung all night and required another hard shutdown.

 

The attached log file is from the latest hard shutdown and was taken by typing "diagnostics" into the command line while the disks are attempting to mount.

tower-diagnostics-20200830-0918.zip

Also, maybe unrelated, but Firefox is no longer displaying CPU graphs or allowing me to type into the command line.  I am using Edge (which is fine).

 

The server has been rock solid for over a year and I have not moved it or changed hardware in 4 months when I added a drive (was a textbook addition).

Link to comment

I have been able to get it to start and shut down correctly but the disks will still not mount.

 

I booted it fresh into safe mode and started Maintenance Mode.  That worked but I can't stop it now because it says, "Disabled -- BTRFS operation is running.".

 

The log for this is attached.  I'm starting to get a little worried about my data!

 

tower-diagnostics-20200830-1840.zip

 

 

Edited by ur6969
Link to comment

Requested diagnostics attached. 

 

This is consistent with multiple tries to restart the array.  It gets to mounting and then hangs whether it's in safe mode or not.

 

A couple other observations:

 

1.  I may have never noticed this, but "Unraid OS Basic" is greyed out at the top right.  "Registration" is not.

 

2.  The server is painfully slow to do anything.  A shutdown (using a Putty command) takes 5 minutes.  Startup takes nearly 15 minutes before the web GUI is available to use. 

 

3.  When using the web gui, 1-2 cores are maxed out on the dashboard screen (quad core J4105) but when I open htop from terminal the CPU is idling with "htop" being the biggest use of resources.

 

Thank you

 

 

 

 

tower-diagnostics-20200831-0847.zip

Link to comment
1 hour ago, johnnie.black said:

And this might be the main issue, there are no errors on the syslog, disks can possibly mount but take a long time, server load is very high for what it's doing, but I see no reason for this high load.

Ok I appreciate the help.  I left it trying to mount the disks this morning and will report back tonight.

 

Obvious question, but is there any reason why the server would suddenly begin to be so sluggish?  Anything to check or modify?

Link to comment

tower-diagnostics-20200901-0713.zip

 

New logs.  I left it for nearly 24 hours and it's still stuck at mounting disks.

 

What's kind of odd is when I looked at the web GUI this morning is the status report pop up showed a PASS report for a 5 disk array.  It knows they are there.

 

Panic is starting to set in about data recovery.  What, if any, are my options at this point?  Unbelievable it would just crash like this.

 

ETA: This thread seems very similar to my issue.  The cache file system was corrupt.  Is this a possibility?  Other than a Nextcloud instance in Docker I can't think of anything else I would lose of importance on the cache disk.  I also have a larger SSD that I could install as a fresh cache and keep my old one to see if that solved the problem. 

 

Edited by ur6969
Link to comment

Still no clue on the syslog on what the problem is, I would suggest two things:

 

#1 - backup current flash, recreate and restore only your key and super.dat (disk assignments), if it still doesn't start like that it's most likely a hardware problem, see #2

#2 - try booting the array on another board, since there are few disks not so complicated, as long as you have another board/pc available.

Link to comment
2 minutes ago, johnnie.black said:

Still no clue on the syslog on what the problem is, I would suggest two things:

 

#1 - backup current flash, recreate and restore only your key and super.dat (disk assignments), if it still doesn't start like that it's most likely a hardware problem, see #2

#2 - try booting the array on another board, since there are few disks not so complicated, as long as you have another board/pc available.

I have hardware to try both of these.  Time is only issue but I will get to #1 tonight.

 

Also please check the edit to my above post with the cache drive corruption.  Would swapping in a new SSD as cache be a #1a in this situation?  I'm not sure of the specifics of what would be required to work or the risk.

 

Do you think the array (3 + 1 parity) can be saved?  That would at least put me at ease.  I've got the important stuff Rcloned to the cloud but I am not looking forward to a rebuild.

 

Thank you for all the help!

Link to comment
5 minutes ago, ur6969 said:

Also please check the edit to my above post with the cache drive corruption. 

I missed the edit, but no, doesn't look like a similar issue, in that case log shows the data disks mounting and then server crashes when mounting cache, in your case it doesn't even start to mount disk1.

 

 

I see no reason to think there's a problem with the data, at least so far.

 

 

Link to comment
On 9/1/2020 at 7:59 AM, johnnie.black said:

Still no clue on the syslog on what the problem is, I would suggest two things:

 

#1 - backup current flash, recreate and restore only your key and super.dat (disk assignments), if it still doesn't start like that it's most likely a hardware problem, see #2

#2 - try booting the array on another board, since there are few disks not so complicated, as long as you have another board/pc available.

I don't want to screw this up.  For #1, do you mean create a NEW usb flash drive using the usb creator for a new install and then copy over my key file and super.dat?

 

Or use my Flash Backup zip file created by Unraid to create a new flash drive?

Link to comment

Ok I've got a newly created flash drive with 6.8.3 and I copied over my super.dat and Basic.key files into the config folder from my backup.

 

About to shut down, replace key, and turn back on.

 

Do I then type my encryption passphrase and hope it mounts?  I have a screenshot of disk assignments.

 

Any way to screw this up and lose it all lol?

Link to comment

Used same flashdrive, new 6.8.3 from creator and copied over super.dat and Basic.key.

 

It turns on, fans running, but no led for network connectivity or blue led on power button.  Let it run for ~20 minutes with no change. 

 

Hard shutdown and tried again, same result.  Can't reach via Firefox or Putty.  My router has it set on a static IP (which I assume would've stayed the same) and it isn't coming up as online.

 

Bad motherboard?  Try new flashdrive?  Take is apart and reseat all connections?

Link to comment

Ok got a monitor on it and turned it on.  It is booting to the motherboard UEFI.  I can't make it boot into Unraid from the newly formatted (but original) flash drive. 

 

It sees the flash drive but for some reason won't go past the bios.

 

ETA:  Shut it down and tried again.  Went to boot select this time.  I can pick the flash drive and hit enter but it just goes right back to the selection.  Will not take it and run.

 

Have run down the usual suspects.  Same USB flash drive as before.  Same USB2 slot.  Made with the creator tool.

 

ETA:  Ok took a bit to come back to me but I remembered I had to delete the "-" off of the "EFI-" folder.  Booting now.  Will post results in a new post.

Edited by ur6969
Link to comment

IT WORKS

 

It looks like all the data is intact.  Two drives are throwing UDMA CRC errors (1 & 3) so I'm not sure that's a big deal. 

 

So how greedy is it to ask if there is a way to get my plugins and settings back?  And specifically the Nextcloud docker?  It's not worth my data to risk it but if there's an easy way I'm interested.

 

Thank you for the help!

Link to comment
4 hours ago, ur6969 said:

Two drives are throwing UDMA CRC errors (1 & 3) so I'm not sure that's a big deal.

No, as along as it doesn't increase, if it does replace SATA cable.

 

4 hours ago, ur6969 said:

So how greedy is it to ask if there is a way to get my plugins and settings back?

You can start restoring the other config files, everything that was inside the config folder, do it a few files at a time to see if you find what was causing the issue, more info on which file does what here.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.