Jump to content
manolodf

Please Help, Dockers Stopped Abruptly, Cache now says "Unmountable No File System". AppData Share Gone :(

53 posts in this topic Last Reply

Recommended Posts

Posted (edited)

I recently received help here for an issue i was having with my cache drive. It was formatted btrfs and due to kernel issue it was giving 403 errors when trying to run dockers. 

 

I properly moved data to array, formatted cache to xfs to match all my other disks, restored the appdata back to cache drive,  and then restored all my dockers back to normal.  Everything was fine for 2 days until some dockers abruptly stopped today ( Plex, Radarr) some even day started even though they don’t work.  

 

Then another thing I noticed is that the auto start is off and they are drawing zero cpu even though those were all auto start on 

7C4E5C3E-8B7B-4CAB-894C-3C1EE45843D0.jpeg

652BF854-43AF-4BB0-A4C2-BB7164DFC6EB.jpeg

 

So I got home and then after a reboot the cache drive is saying Unmountable, my Appdata share is totally gone now, I dont know what to do!! Now I have all my dockers and data missing!   The Cache drive still says formatted XFS, so I dont know what could have gone wrong, please help!  

 

That is why I found those logs above strange since they mention BTRFS error, when I had no BTRFS drives on my system. 

tower-diagnostics-20190614-0029.zip

Edited by manolodf
Attached diagnostics, cache drive unmountable now

Share this post


Link to post
Posted (edited)

1104627833_ScreenShot2019-06-13at8_11_35PM.png.9a8cd6f789df2d6534141723e68d2572.png

 

So Apparently its giving me the Log 100% again...

 

And all shares seem to be missing as well as Cache Drive is in Sleep/Standby which I had never seen before

 

1285300368_ScreenShot2019-06-13at8_15_05PM.thumb.png.b6fe0433cf2decdedf7109d840843194.png

363795907_ScreenShot2019-06-13at8_16_58PM.thumb.png.d58ac8ee9d69a29ce3b8f064da6242b5.png

 

Edited by manolodf
Added new findings

Share this post


Link to post

649106116_ScreenShot2019-06-13at8_30_42PM.thumb.png.dbff344622d119ecaad4b1495d4d40f5.pngAfter Reboot it now says unmountable, No File System... My Appdata Share is missing also!! 

Share this post


Link to post

I ran a Check FileSystem Status and got this: 

 

Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!

attempting to find secondary superblock...

......(too Many)....   Sorry, could not find valid secondary superblock

 

Exiting now

Share this post


Link to post

Cache filesystem is corrupt, so is disk5's, if you're getting corruption out of the blue you might want to run memtest to check your RAM.

Share this post


Link to post

Is there any way to recover my appdata?  Because after formatting it xfs and moving over appdata it worked just fine for a while until it just froze.   I did not do a preclear when formatting xfs, not sure if that’s an issue?   

 

Disk 5 Filesystem is corrupt?  That one has not said unable to mount.  The integrity checker finished the build on that just yesterday as well.  

Share this post


Link to post
5 minutes ago, manolodf said:

Disk 5 Filesystem is corrupt?

Yes:

 

Jun 10 13:09:32 Tower kernel: XFS (md5): Metadata corruption detected at xfs_dir3_data_reada_verify+0x52/0x63 [xfs], xfs_dir3_data_reada block 0x116318260
Jun 10 13:09:32 Tower kernel: XFS (md5): Unmount and run xfs_repair

 

6 minutes ago, manolodf said:

Is there any way to recover my appdata?

Not unless xfs_repair can fix the filesystem, but looking at the output you're likely running an incorrect command, without specifying the partition, it should be:

 

xfs_repair -v /dev/nvme0n1p1

 

Share this post


Link to post

MemTest:  Is there a way I can run memtest within unraid, or without a physical keyboard?  I dont have a physical keyboard here at the moment to be able to plug Unraid to a monitor and check that.

 

Cache Drive: The FileSystem Check I did was the one from the Unraid GUI with option -n, anything else to try there, or should I go Command Line for it?

386580944_ScreenShot2019-06-14at10_42_10AM.thumb.png.ada1efc51559fd95871846b7ce990a9d.png

 

Disk 5:

Ran FileSystem Check since it was already mounted as Maintenance Mode:

421357666_ScreenShot2019-06-14at10_46_37AM.thumb.png.1d6a52d49fd9caaa7bb767cc9463318d.png

Share this post


Link to post
1 minute ago, manolodf said:

anything else to try there, or should I go Command Line for it?

Not common for the superblock to be damaged, you can try again after updating xfsprogs but the result will be likely the same.

 

2 minutes ago, manolodf said:

Disk 5:

Run without -n or nothing will be fixed.

Share this post


Link to post
2 minutes ago, johnnie.black said:

Not common for the superblock to be damaged, you can try again after updating xfsprogs but the result will be likely the same.

Ok, working on that now, what is strange is this happened after my Cache drive was btrfs, and i believe it had the issue (it had 180gb free space) where you had to do the btrfs balance, but I could not do it at all,  so thet is why I moved appdata to array, formatted to xfs, then moved data back to cache drive.     I am not sure if the Filesystem change could have caused this?  Any other tests you want me to run on it?

 

5 minutes ago, johnnie.black said:

Run without -n or nothing will be fixed.

Here are the results on Disk5 without -n: (I had done this 2 days ago also when I had the docker issues with the btrfs cache drive, and after it fixed them I had not seen any errors)

602140963_ScreenShot2019-06-14at10_56_09AM.thumb.png.88a491ab0acfcb32e64b8e85e9e75946.png

Share this post


Link to post
15 minutes ago, manolodf said:

I am not sure if the Filesystem change could have caused this?

No, but 2 different filesystem going corrupt withing a short space of time makes me think you have a hardware issue, like bad RAM.

Share this post


Link to post
8 minutes ago, johnnie.black said:

No, but 2 different filesystem going corrupt withing a short space of time makes me think you have a hardware issue, like bad RAM.

Is there a way I can run memtest without a USB keyboard?  Via GUI, or some other way?

 

So Disk5 was corrupt in the results above?  I guess I just did not see any errors. 

 

Cache Drive:  Same Results after xfsprogs update running Filesystem Check, so there is nothing to do here?  Should I preclear the drive?  just hit format and try to restore backup?  

1043671273_ScreenShot2019-06-14at11_22_36AM.thumb.png.a0274b74feed81d95813da6f1aa10951.png

Share this post


Link to post
1 minute ago, manolodf said:

Is there a way I can run memtest without a USB keyboard?  Via GUI, or some other way?

Click on the flash drive and select memtest as default boot, but note that after you'll need to manually edit syslinux.cfg to change default boot back to Unraid.

 

2 minutes ago, manolodf said:

So Disk5 was corrupt in the results above?  I guess I just did not see any errors. 

Unless you check the exit code it's not always easy to see with just the output, but there's no point in running with -n, this way if there were errors they were fixed.

 

4 minutes ago, manolodf said:

so there is nothing to do here?

Nothing I can think of, only option would be to ask for help on the xfs mailing list, other than that format and restore.

Share this post


Link to post

Cache:

Should I preclear the drive to like double make sure its ok?  I just find it so weird that 2-3 days ago I formatted XFS and I thought I was on my way to trouble free-ness and then it just exploded on me.  

 

Memtest:

Do you know if this will work as a Keyboard on my Unraid Server?  The USB Dongle and this RF Keyboard I have for some android box? IMG_0819.thumb.jpg.a7c08e5f415f70f28407bda7110f0ecb.jpg

 

Otherwise, its run memtest, then I have to remove flash drive, plug into mac, modify syslinux.cfg then plug into server and boot up again correct?  Will I need to interact with it at all with a keyboard once memtest loads up?  That is my dilemma. 

 

Also, not sure if this could be it, but my Flash/Boot drive is the oldest living piece of hardware in my system.  Would that be the cause of any of these issues?  Ive always been amazed at how it has not failed, but I dare not mess with it. 

 

Share this post


Link to post
16 minutes ago, manolodf said:

Should I preclear the drive to like double make sure its ok?

No need to preclear SSDs,

 

17 minutes ago, manolodf said:

 I just find it so weird that 2-3 days ago I formatted XFS and I thought I was on my way to trouble free-ness and then it just exploded on me.

Filesystem corruption is very seldom causeded by the device, bad RAM for example is much more likely.

 

As for the keyboard not idea, you can try it before rebooting, and no, the flash drive can't cause filesystem corruption on data/cache devices.

Share this post


Link to post

Heck yea, it worked.  Any specific Mem test I should be doing?  It just started conducting it as soon as I selected the Memtest, there was a prompt for failsafe or something like that but I guess I was too slow then it just started running. 

 

IMG_0820.thumb.jpg.a73c7c889d863916e091a38e5b4c53a3.jpg

Share this post


Link to post
20 minutes ago, johnnie.black said:

No need to preclear SSDs,

 

Filesystem corruption is very seldom causeded by the device, bad RAM for example is much more likely.

 

Any other test I should perform on the SSD?  I mean everything seems ok, but after it had that BTRFS issue, it seemed happy then croaked on me.   

 

The Disk5 would not have caused a Cache Drive FIleSystem issue would it?  

 

So far 50% no RAM errors, will post results when it finishes.  If I need to do a specific test please let me know and I will run it after it is done, or not sure if it logs a file to post it once Unraid boots back up?  

 

On that log that said Tower Kernel: BTRFS error (device loop2)...  That threw me off because I had no BTRFS drives at that point anymore.  My entire system was XFS, all drives in array and Cache drive were all XFS, so I was not sure why in the log it would show BTRFS. 

Share this post


Link to post
Posted (edited)
4 minutes ago, manolodf said:

On that log that said Tower Kernel: BTRFS error (device loop2)...  That threw me off because I had no BTRFS drives at that point anymore.  My entire system was XFS, all drives in array and Cache drive were all XFS, so I was not sure why in the log it would show BTRFS.

The docker.img and libvirt.img files are disk images that are internally formatted as BTRFS.   This happens regardless of the type of file system being used to store these files.  It is these disk image files that you see being mounted as loop devices and thus references to BTRFS with regard to them.

Edited by itimpi

Share this post


Link to post

So far after first pass everything seems good on the Ram side of things.   Could it be anything else?  

 

IMG_0824.thumb.JPG.46e6829159a386b9173eb9fd74704b7d.JPG

 

Share this post


Link to post

memtest should run for at least 24 hours, but yes, there could be other reasons, though RAM would be the most common.

Share this post


Link to post

Hit the 24 Hr mark with 0 errors so far.  Should I let it keep running?  Or is there something else to test, or attempt?  

 

 

IMG_0833.thumb.jpg.6dcaf8b7c7d9b3a21af9b86224d4b0db.jpg

Share this post


Link to post

I am at almost 36hrs with no Ram Erros, should I let it keep going to 48 or more, or is it good now to look elsewhere or try and see if my appdata can be restored elsehwere since Filesystem check did not work on this? IMG_0848.thumb.JPG.590697813bed671562ccbd32c52240eb.JPG

Share this post


Link to post

ram should be fine, I'd run badblocks on the drives.   Write test is best, but it destroys the data on the drive... probably best to do that though and restore your data from backups.

 

Have you collected smart data on the drives?

Share this post


Link to post
2 minutes ago, Abzstrak said:

ram should be fine, I'd run badblocks on the drives.   Write test is best, but it destroys the data on the drive... probably best to do that though and restore your data from backups.

 

Have you collected smart data on the drives?

I have not collected smart data.  Should I do that for all the drives or just the Cache?    

 

Im sorry, Im not familiar with badblocks, but I would be glad to run it if I can get pointed in the right direction.  

 

Run it on Cache drive to write over data then restore the appdata from a backup?

 

Share this post


Link to post
Posted (edited)
11 minutes ago, manolodf said:

I have not collected smart data.  Should I do that for all the drives or just the Cache?    

 

Im sorry, Im not familiar with badblocks, but I would be glad to run it if I can get pointed in the right direction.  

 

Run it on Cache drive to write over data then restore the appdata from a backup?

 

It wouldn't hurt to collect smart data on all drives, make sure they all look ok. Post here if you are unsure.

 

Do you just have a single cache drive?  or is it a pool? (raid1?)  when was the last time you trim'd it?

 

badblocks is kinda like memtest for a drive.  It can't fix anything really, but it can really help determine if a drive is bad (kinda like memtest does for ram).   I did a quick google, the Arch wiki has alot of pretty great info on running badblocks.  Obviously you'd not want your array spun up.  And be aware that the write test is destructive, but much better.  If you use it, you'll have to reconstruct the data from parity.  Be aware it takes a while to run, like memtest.  When ever I run badblocks, I usually use the following (the w flag is write, which destroys data):
badblocks -svvvwt random /dev/sdX

Edited by Abzstrak

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.