6.8.3: Share Sub-Folder Suddenly Gone (invisible on SMB/Telnet & Individual Drives, but NOT Array File Size)

wheel · March 5

Vanilla, no-cache-drive system; only app/plug-in/docker even set up is Krusader.

I was just now using Windows SMB cleaning up some files like usual (moving folders from a long-standing, enormous folder - many, many terabytes - to a smaller, new folder) when my system lagged substantially, then eventually moved the file I was asking it to move (a request which had doubled itself due to me thinking Windows hadn't received the move request the first time I made it, and me trying it again before the lag became apparent).

The next thing I know, that entire source subfolder I was moving from (edited) is gone. (Edit: even stranger, it's gone as a subfolder from EACH INDIVIDUAL DRIVE, not just the "main" shared folder!). My first thought was Windows SMB glitched up (which it does every now and then in terms of "unreadable" unraid folders), and a reboot would fix things. I idiotically rebooted my unraid system for good measure without pulling diagnostics before doing so.

The many, many terabytes from the folder (spread across all disks in the array) appear to still be used up on my disks (total free space is the same amount it was before I started cleaning this afternoon). I've had shares and subfolders disappear from SMB before, and found ways around that, but I've NEVER seen a subfolder disappear from being viewable when Telnetting into any of my unraid servers.

Any clues on how to restore visibility of and access to this subfolder would be greatly appreciated. I'm not seeing any obvious warnings about file system corruption like others on the forum seem to have had with missing subfolders in the past, but I may be reading the diagnostics incorrectly.

Edited March 7 by wheel
More info (not just missing from share)

wheel · March 5

Attaching new diagnostics after running (all seemingly OK?) file system checks on each XFS drive in the array as other users have been advised in past threads and using the in-GUI 6.x methods in the unraid FAQ.

EDIT TO ADD: All other subfolders in this share (and all other shares) are still totally fine. Just the one that I was moving items FROM is now gone in every way except array file size.

EDIT 2: The movement occurring was from one subfolder to another subfolder within the same share. No disk share / user share cross-contamination issues here.

EDIT 3: Learning all sorts of fun Linux stuff trying to figure out what's happened here. Just used Telnet to (edited), and I'm getting "no such file or directory." There's no way I've really lost this entire multi-terabyte folder due to a single glitch while moving one file out of it, right? It'd be the craziest thing that's ever happened to me in over a decade with unraid if so, for sure.

Edited March 7 by wheel
Clarifying info

wheel · March 5

These are the (identical) results for every drive in my array (EDIT-actually, each Phase 4 seems to have the agno entries in a different order. The rest of the results look identical). I've completely exhausted any help existing posts could provide, unless I'm searching way off base. Potentially hundreds of terabytes of (replaceable but not easily) data are in you gurus' hands, now. Cannot thank you enough for any possible help.

Phase 1 - find and verify superblock...
Memory available for repair (1308MB) may not be sufficient.
At least 1911MB is needed to repair this filesystem efficiently
If repair fails due to lack of memory, please
turn prefetching off (-P) to reduce the memory footprint.
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 2
        - agno = 3
        - agno = 0
        - agno = 1
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

Edited March 5 by wheel
Not-QUITE-identical results

JorgeB · March 5

Run xfs_repair again on all disks without -n, but you may not have enough RAM to do the repair if needed.

itimpi · March 5

3 minutes ago, JorgeB said:

but you may not have enough RAM to do the repair if needed.

Probably worth pointing out that the minimum recommended memory for current Unraid releases for everything to function as expected is 4GB. Any chance of you increasing the RAM on your system?

wheel · March 5

3 hours ago, itimpi said:

Probably worth pointing out that the minimum recommended memory for current Unraid releases for everything to function as expected is 4GB. Any chance of you increasing the RAM on your system?

This is totally news to me! I built this server for unraid well over a decade ago now and haven't touched anything (hardware-wise) since, so I'm actually shocked it only had 2GB of RAM (with my "modern-day" thinking) this whole time.

It's an incredibly ancient motherboard now, but from what I'm seeing online, my American Megatrends ECS A885GM-A2 board is compatible with up to 32GB of Crucial's old DDR3L-1600 UDIMM (non-ecc) sticks (all listed as EOL on their website, but apparently still available at a few retailers).

I clearly need to upgrade the RAM regardless, but it would be AMAZING if doing that and running xfs_repair could save me a few weeks / months of work trying to save the other folders on this system before (from what I'm reading) I'd need to format the whole thing for safety and start from scratch anyways.

Two questions:

(1) Any potential benefits (for this rescue attempt or any other future functions) to just bumping this box up to 32GB of ram, or should 8GB or 16GB be plenty for future-proofing a vanilla system (and I realize the answer for this question may be "hard to predict the future" - if so, could my repair operations benefit from more RAM, or just getting past the 4GB barrier is as best as I can hope for on odds of success)?

(2) Are there any extra safety tips I should follow (in this particular, precarious instance) before removing the old RAM and installing the new sticks, or once those sticks are installed before running xfs_repair?

Thank you both so much for the assistance with this! Been pretty concerned about the work involved with "getting back to normal" and this news is helping greatly.

itimpi · March 5

It look like xfs_repair would recover everything. I see it suggests using -P (which I have never used) so it might be worth trying without -n and adding -P.

wheel · March 5

2 hours ago, itimpi said:

It look like xfs_repair would recover everything. I see it suggests using -P (which I have never used) so it might be worth trying without -n and adding -P.

I tracked down a locally-available pair of mobo-compatible Crucial 4GB sticks which I could get and install this afternoon; based on what I read (across various forums) about xfs_repair yesterday, it seems like I want it to work as well as possible on the first try if I want the best chance of keeping all data, so:

Does it make more sense to run xfs_repair -P now (before modifying my system at all by pulling out and installing more RAM) or switch out the RAM later today and try xfs_repair -n again to see if that message (prompting me to -P) pops up again (or is replaced by new concerns)?

itimpi · March 5

3 minutes ago, wheel said:

I tracked down a locally-available pair of mobo-compatible Crucial 4GB sticks which I could get and install this afternoon; based on what I read (across various forums) about xfs_repair yesterday, it seems like I want it to work as well as possible on the first try if I want the best chance of keeping all data, so:

Does it make more sense to run xfs_repair -P now (before modifying my system at all by pulling out and installing more RAM) or switch out the RAM later today and try xfs_repair -n again to see if that message (prompting me to -P) pops up again (or is replaced by new concerns)?

I very much doubt it actually makes much difference. I think that using -P might just slow things down. Having said that having I have no experience of using -P so probably be best to upgrade RAM first.

wheel · March 7

Bad news: after upgrading ram and running xfs_repair without -n on every drive (and receiving the following results on each drive), (edited) is still missing on SMB and Telnet (but its data is still clearly "taken up" from the array"). Diagnostics attached (and I also have a set of diagnostics I took before exiting maintenance mode for the repairs, if that helps).

Any more ideas would be seriously appreciated! Getting a little more concerned that something weird happened, now. Thank you both for all of the help so far!

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 2
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

Edited March 7 by wheel

JorgeB · March 7

Post the output of:

ls -la /mnt/user/features

wheel · March 7

total 40

Edited March 7 by wheel

JorgeB · March 7

I don't see any HD folder, there's only HD-(2).

Also some of those permissions are not right.

wheel · March 7

I think I may have done a poor job of explaining myself initially:

HD is gone. The files within HD are gone. The file size of those files (multiple terabytes on each of the 18 drives in the array) is still being taken up by SOMETHING, but I can't see or access it anymore because the folder named "HD" (which once contained them) has disappeared from both my user shares (where it was a subfolder of "features") and from each individual disk it was on (where it was a subfolder of each disk's "features" folder).

This occurred immediately after trying to move a folder from "HD" to "HD-2" (both within the "features" folder) on a user-share level (not share-to-disk or vice-versa) using Windows SMB. Windows SMB lagged, which I didn't realize, and I tried to move the same folder a second time. When Windows caught up with itself, I saw the folder get "moved" twice (two successful moves of the same folder). After that, /HD (in user shares and on disks alike) was invisible and inaccessible from SMB.

I really wish I had captured diagnostics at this point, but I've had so many "inaccessible" SMB issues that were resolved by a simple reboot over the years that I just went ahead and did that instinctively. Once the system was back up and running, with "HD" still missing, I went through Telnet/MC (which has previously shown me folders that had been "missing" over SMB until the SMB issues were resolved) and saw the "HD" folders were ALSO gone on that side of things (i.e., not JUST an SMB issue), I came here with my questions.

I had no idea one folder being moved in a weirdly-Windows way could potentially lock me out of a massive pile of data, but here I am, confused and lost, really hoping I'm not about to spend the next few months moving what folders did survive over to some other storage solution before I have to wipe all 18 of these drives and start replacing things from scratch.

Edit: for additional clarity, I'm guessing this is dozens of TB total. My "free space" in the array has been stable at about 10TB free both before and after the "disappearance" of the HD folder, which is what gives me hope that its contents are salvageable (since they still seem to exist somewhere across the drives, if only as dead bits taking up space).

Edited March 7 by wheel
Additional clarity at the end

JorgeB · March 7

17 minutes ago, wheel said:

HD is gone.

OK, initially I understood it only wasn't visible with SMB, do you know the name of a folder that exited inside HD?

wheel · March 7

I can probably guess at a decent amount of them, but one for sure should be (edited)/

Edited March 7 by wheel

JorgeB · March 7

Try looking for one, to see if it's somwhere else, e.g.:

find /mnt -iname '2 Days*'

wheel · March 7

I have no idea how exactly the move went down during the "windows lag" moments, but the "missing" subfolder (and all of its subfolders) straight moved into another (adjacent) subfolder. I'm double-checking to make sure everything is still there, but this is looking more and more like user error solved by find /mnt -iname maneuver.

Thank you SO MUCH for steering me in that direction!

6.8.3: Share Sub-Folder Suddenly Gone (invisible on SMB/Telnet & Individual Drives, but NOT Array File Size)

Recommended Posts

wheel

Link to comment

wheel

Link to comment

wheel

Link to comment

JorgeB

Link to comment

itimpi

Link to comment

wheel

Link to comment

itimpi

Link to comment

wheel

Link to comment

itimpi

Link to comment

wheel

Link to comment

JorgeB

Link to comment

wheel

Link to comment

JorgeB

Link to comment

wheel

Link to comment

JorgeB

Link to comment

wheel

Link to comment

JorgeB

Link to comment

wheel

Link to comment

Join the conversation