unclean shutdown, cache drives unmountable

December 17, 20241 yr

I apologize in advance if this has been covered ad nauseam here already but I'm not even sure what to search for, much less how to fix it. Please point me in the right direction if this has already been addressed.

I had an issue that left me with no alternative but to kill the power to my unRAID server. The issue wasn't with the server itself, just an ongoing network problem that I'm still trying to figure out. This isn't the first or second or third time this happened, but it is the first where the server had a critical issue. When I powered the system back on the first thing I did was to check my Docker containers, but there weren't any there. I verified that the docker.img file is present and started searching but a few threads mentioned deleting this file and I didn't want to touch anything, especially since the parity check was still going.

The parity check completed and on the main page it said all the disks in my cache were unmountable. I searched for this and found THIS guide on Data Recovery, which lead to the page on Storage Management. I followed those directions up until the Repairing the File System steps because I didn't want to go any further without another set of eyes taking a look. Here's what I got when I checked the file system in Maintenance mode:

[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
free space info recorded 1338 extents, counted 1351
wanted bytes 319488, found 258048 for off 50643865600
cache appears valid but isn't 50563383296
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
Checking filesystem on /dev/sdc1
UUID: 9c7b3ee3-4edf-4543-a216-62f203b71a1b
found 143473057792 bytes used, error(s) found
total csum bytes: 26882928
total tree bytes: 288391168
total fs tree bytes: 159842304
total extent tree bytes: 87572480
btree space waste bytes: 52782186
file data blocks allocated: 214446366720
 referenced 135832350720

Several of these containers took literal weeks to configure (not the templates themselves, the actual apps) so I'm trying to avoid having to start over from scratch if possible. If not possible, well, that sucks but OK, but I'd also like to know how/if I can fully back up the containers to save all the app data.

What's the outlook and how should I proceed?

Quote

December 18, 20241 yr

Community Expert

Please post the diagnostics after array start.

Quote

December 18, 20241 yr

Author

Diagnostics are attached.

unraid-diagnostics-20241218-1235.zip

Quote

December 18, 20241 yr

Community Expert

Type in the CLI

btrfs rescue zero-log /dev/nvme0n1p1

Then re-start the array and post new diags.

Quote

December 19, 20241 yr

Author

Holy......that fixed it. The cache pool is back up and running. Can you tell me what the problem is and how that command fixed it?

Diags are post per your request.

Edit: may have spoke a little too soon. Docker is running and all the containers that were set to auto-start came back up, but trying to start some manually threw back an execution error. I tried restarting some of the ones that were running and they threw out the same and won't start back up.

unraid-diagnostics-20241219-0955.zip

Edited December 19, 20241 yr by Coogan2007
Added a little more info

Quote

December 19, 20241 yr

Community Expert

33 minutes ago, Coogan2007 said:

Can you tell me what the problem is and how that command fixed it?

It wasn't mounting because the log tree was damaged, that command zeroes it, but there may be other issues, not seeing anything else logged for now, try recreating the docker image to see if it resolves the docker problems:

https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-file
Then:
https://docs.unraid.net/unraid-os/manual/docker-management/#re-installing-docker-applications
Also see below if you have any custom docker networks:
https://docs.unraid.net/unraid-os/manual/docker-management/#docker-custom-networks

Quote

December 20, 20241 yr

Author

Welp, the docker.img file is set to read-only and I can't delete it. I tried from the GUI and console and it said the entire filesystem is in read-only mode, and I haven't a clue how to fix it outside of re-formatting the cache drive. Any other suggestions?

Quote

December 20, 20241 yr

Community Expert

Post new diags.

Quote

December 20, 20241 yr

Author

Ahh, thank you.

I ran a repair on it in maintenance mode overnight but didn't seem to do anything. A filesystem check came back with the same stuff as before.

Edit: just occurred to me the diags attached were in maintenance mode. I've restarted the array and replaced the diags, just in case.

unraid-diagnostics-20241220-0848.zip

Edited December 20, 20241 yr by Coogan2007
uploaded new diagnostics file

Quote

December 20, 20241 yr

Author

Just another note. The /system dir that was in read-only appears to be correct now. I opened a console and touch'ed an empty file in /mnt/disk1/system/docker and it was created, so it appears the repair may have fixed it, I guess? Docker is running and while my containers aren't there, it seems like progress. Still, I'm gonna wait until I get some feedback before proceeding any further. Thanks for the help with this.

Quote

December 20, 20241 yr

Community Expert

The last diags are showing the pool unmountable again, is it still like that?

Quote

December 20, 20241 yr

Author

Yep, I didn't even notice that until you mentioned it. I thought everything was doing better. Am I going to need to re-format the cache drive?

Quote

December 20, 20241 yr

Community Expert

Use the same btrfs zero log command, it should work again, but since it recurred, recommend backing up and reformatting the poll.

Quote

December 20, 20241 yr

Author

50 minutes ago, JorgeB said:

recommend backing up and reformatting the pol

The cache pool? How do I back it up?

Edit: nvm, I found it in the Storage Mgnt section. I'm leaving for a week for the holidays and probably won't bother undertaking all this until after I get back. There's nothing system/life-critical on it so I'll shut it down and start everything after the holidays. I'll report back in this thread with results. Thanks again for all your help.

Edited December 20, 20241 yr by Coogan2007
answered my own question

Quote

1

January 1, 20251 yr

Author

OK I'm back from holiday and hope to start fixing this. I started following the instructions for reformatting the cache pool in the Storage Management page and immediately hit a problem:

Quote

4. Set all shares that have files on the cache and currently don't have a Use Cache:Yes to BE Cache:Yes. Make a note of which shares you changed and what setting they had before the change

There's nothing in any of the share settings that say "Use Cache" anywhere. There's a "Primary Storage" setting. Is that it? Am I looking at the wrong thing?

Edited January 1, 20251 yr by Coogan2007

Quote

January 2, 20251 yr

Community Expert
Solution

See here:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/#findComment-511923

Quote

January 2, 20251 yr

Author

Got it. A few more questions about this if you don't mind.

I invoked the mover and it didn't seem to do anything. The shares still say either "Share contains data" or "Share is empty". Do all of them need to say they're empty to know that all the data was moved to the array?

Regarding the appdata folder. It's my understanding that this is where Docker containers keep all their application data. My appdata folder is empty. I assume that means all data my containers held is now gone? If so is there a way to recover it? Trying to save myself weeks of re-googling, retrying, and rebuilding all my containers from the ground up.

Quote

January 2, 20251 yr

Community Expert

The shares will still contain the same data, it will just be moved to other storage device/pool.

Quote

January 3, 20251 yr

Author

Final update (hopefully). Everything seems to be OK. I reformatted the cache pool, as well as re-arranging the order of the drives (don't know if that actually helps) and after a brief panic when all my exportable user shares were gone (fixed by a quick reboot), everything is running fine. One of the SSDs threw out a "percent lifetime remaining" SMART error but I'll fix that at some point.

My docker appdata was indeed wiped out at some point. Fortunately, the docker container whose app data was going to be the most time-consuming to rebuild kept regular backups of it's database and a quick restore later it's all good. Still got VPN and all my *arrs to fix but there's no rush for that.

Thanks for all the help, JorgeB. You're a life-saver.

Edited January 3, 20251 yr by Coogan2007

Quote

1

unclean shutdown, cache drives unmountable

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)