[6.8.3] shfs error results in lost /mnt/user

robertklep · December 15, 2023

I posted a large report earlier today which seems to touch this issue. In my case it's most likely related to NFS, and it occurs both on 6.11 and 6.12.

I can't disable hard link support easily as this causes issues with Docker Mailserver, but I have disabled NFS for the time being and hope that my server will be able to stay up for more than a few days.

However, given that this issue might be related to Unraid's "magic" (shfs) and libfuse, and because it has existed for such a long time already, I'm going to have to assume it's unfixable. At the moment I'm still suffering from sunk-cost-fallacy-syndrome, but once I'm past that I guess I'll start looking at alternatives 😢

robertklep · December 16, 2023

23 hours ago, robertklep said:

but I have disabled NFS for the time being and hope that my server will be able to stay up for more than a few days.

That didn't fix anything, I just got another hang. That's three times in three days now. I'll revert back to 6.11.5 where I at least managed to get an uptime of a few weeks (once).

jackjack · December 21, 2023

Yeh im getting really fucking sick of this to be honest, becoming utter shit

robertklep · December 21, 2023

10 minutes ago, jackjack said:

Yeh im getting really fucking sick of this to be honest, becoming utter shit

After yet another hang I am planning my move away from Unraid to either TrueNAS SCALE or Proxmox.

I've enjoyed using Unraid due to its simplicity, but in the year that I've used it it was never stable for me and now I'm done with it. My hardware setup doesn't really benefit a whole lot from Unraid's mixing-and-matching-drives feature, while that same feature is causing a lot of issues for me.

jackjack · December 22, 2023

21 hours ago, robertklep said:

After yet another hang I am planning my move away from Unraid to either TrueNAS SCALE or Proxmox.

I've enjoyed using Unraid due to its simplicity, but in the year that I've used it it was never stable for me and now I'm done with it. My hardware setup doesn't really benefit a whole lot from Unraid's mixing-and-matching-drives feature, while that same feature is causing a lot of issues for me.

I'm feeling ya, I'm getting utterly sick of this fucking dog shit way of managing disk's in the array as well, yeh nah, unraid can get in the fucking bin, I'm over it.

Clear a disk, format

Array says cant mount it needs to be formatted, so do so, disk unformattable, unmount, format, mount, make a folder and all working fine, do a pre-clear on the disk and start again.

Nup unraid says get fucked none of that is going to work in the array, do you want to format the drive? get fucked unmountable drive and unformattable drive.

You may get a sense of frustration due to the colourful language being employed here, however, fuck this utter dog shit

itimpi · December 22, 2023

1 hour ago, jackjack said:

Array says cant mount it needs to be formatted, so do so, disk unformattable, unmount, format, mount, make a folder and all working fine, do a pre-clear on the disk and start again.

That is not the process you should follow according to the online documentation.

jackjack · December 26, 2023

On 12/22/2023 at 8:53 PM, itimpi said:

That is not the process you should follow according to the online documentation.

ahh yes, followed the documentation, same result, brand new disks.

robertklep · December 31, 2023

Moved from Unraid to plain Ubuntu, uptime is now already more than the average that Unraid managed on the same hardware the last couple of months.

On 12/21/2023 at 11:35 AM, robertklep said:

After yet another hang I am planning my move away from Unraid to either TrueNAS SCALE or Proxmox.

I've enjoyed using Unraid due to its simplicity, but in the year that I've used it it was never stable for me and now I'm done with it. My hardware setup doesn't really benefit a whole lot from Unraid's mixing-and-matching-drives feature, while that same feature is causing a lot of issues for me.

Lignumaqua · January 6

Had my first experience of this yesterday and it's completely repeatable. I'm in the middle of consolidating my data disks from 4TB to 8TB drives. As part of this process I've been using Unbalance to move files from old disks to new. This has all been working fine until it came to trying to move my Seafile folder structure. Seafile, if you don't know, stores all files as small chunks and there are millions of these small files in the folder structure - something over 6 million in my case.

Very soon after Unbalance enters the planning stage on all these tiny files, the system loses all shares. Only way back is to reboot. I'm not blaming Unbalance at all here, instead I suspect it is something to do with the sheer volume of files that are being opened, the space needed to store that data, or the rate of doing so. I have no hard evidence for which it is, other than this happened every time I tried to use Unbalance on this folder set, and Unbalance handles every other folder on my system with no problems.

FWIW - I am now moving the files using rsync from the command line but without the planning step and that seems to be working.

jit-010101 · January 7

14 hours ago, Lignumaqua said:

Had my first experience of this yesterday and it's completely repeatable. I'm in the middle of consolidating my data disks from 4TB to 8TB drives. As part of this process I've been using Unbalance to move files from old disks to new. This has all been working fine until it came to trying to move my Seafile folder structure. Seafile, if you don't know, stores all files as small chunks and there are millions of these small files in the folder structure - something over 6 million in my case.

Very soon after Unbalance enters the planning stage on all these tiny files, the system loses all shares. Only way back is to reboot. I'm not blaming Unbalance at all here, instead I suspect it is something to do with the sheer volume of files that are being opened, the space needed to store that data, or the rate of doing so. I have no hard evidence for which it is, other than this happened every time I tried to use Unbalance on this folder set, and Unbalance handles every other folder on my system with no problems.

FWIW - I am now moving the files using rsync from the command line but without the planning step and that seems to be working.

Unbalanced is something I wouldn't recommend honestly if you value your files (sorry, just being honest).

If you need a multi-threaded move that is much faster then rsync then use rclone copy ... you can install rclone via the Nerd Tools Plugin. Afterwards do a rsync for the proper permissions and be done with it .. Terminal only thought.

Nothing Click & Point.

In addition to that:

Seafile is something you shouldn't use on Spinning Rust - and anything simliar unless you know what you're doing. You will need to control the block size/pack size and adjust it accordingly because any HDD will struggle with that amount of files.

(I have lot more files by the way ... and never had that issue on any of my two Unraid devices so far).

You shouldn't move files to /mnt/user/[shares] anyway unless you know what you're doing - its much smarter to target the individual disks yourself - that way you're also avoiding the overhead.

So /mnt/disk[n]/sharename ...

If you're using ZFS you might want to make sure the pool is created on the individual disk before doing so and that it's not just a plain folder ... if there are still troubles well then this might be a bug.

---------

On-Topic:

Besides this it all seem like sympthoms - not example the root cause. Try fixing an error you have no root cause for is like searching for a needle in the haystack (even more so if you dont have access to the hardware). So if moving to Ubuntu helped you - so be it. I personally (almost) lost a shit ton of stuff from to workstations running enterprise hardware running on Ubuntu LTS in the last two years due to fucked up updates - good luck with that. But then again these were powered by quite more modern hardware too ... so not comparable. Then again in the last 4 years I also had several issues with Ubuntu Servers and their Auto-Updates onlegacy hardware too ...

If I'd recommend anything then it would be something like OpenSuse Thumbleweed - or a more "trustable" stateless distro.

Good luck with your venture, and dont let the negative emotions eat you away.

Edited January 7 by jit-010101

robertklep · January 7

Given that I was the one moving to Ubuntu: the root cause for me was an issue in Unraid, specifically in `shfs`. I can't fix it because it's closed source, but since the issue has existed for quite some time now (this thread started back in 2018), apparently nobody else can fix it either.

I'm not inclined to keep trying to find the specific error, or the specific conditions that trigger it, because that's simply not my job and the days where I liked to spend my free time on something like this are long gone. This is commercial server software and if it can't handle keeping my server up for more than a few days at a time, it's not suitable for me.

So the next best thing for me was moving to something else, which became Ubuntu because I've been using it since it was first released, and Debian before that. I've never had issues with updates, but I'm also not someone that runs automatic updates or .0 releases. With the next big update I guess I'll see whether my boot/root ZFS snapshotting will come in useful if the update gets botched 😅

4 minutes ago, jit-010101 said:

Besides this it all seem like sympthoms - not example the root cause. Try fixing an error you have no root cause for is like searching for a needle in the haystack (even more so if you dont have access to the hardware). So if moving to Ubuntu helped you - so be it. I personally (almost) lost a shit ton of stuff from to workstations running enterprise hardware running on Ubuntu LTS in the last two years due to fucked up updates - good luck with that.

XiMA4 · January 7

11 minutes ago, robertklep said:

WoW!
I can't say it better than that! I totally agree!

Edited January 7 by XiMA4

jit-010101 · January 7

If I go back accross the posts here - shfs seems likey is only the layer it will crash in the end.

See this report here:

https://github.com/libfuse/libfuse/issues/589#issuecomment-857484504

libfuse is basically in life-support-mode - since 2021

This could likely hit all Nix & BSD systems using a FUSE filesystem in the end depending on the edge case they trigger.

Interstingly in the same issue mergefs is mentioned:

https://github.com/trapexit/mergerfs/issues/965

I think I encountered something simliar on Debian 9 with OMV, Snapraid + MergerFS myself which was much more serious in regard to file losses (which is why I moved to Unraid myself).

So be careful if you try to use MergerFS/Snapraid (similiar issues)

There are so many layers that can possibly cause these issues, most likely pointing towards that this could also be caused by shfs itself very well because Open-Source development of it has been discontinued for 20 years now (heh).

The good thing with Unraid is that you can easily downgrade - and stay on a version that stays stable for you too.

You dont have to upgrade.

Things like this will happen with all Distros. Just trying to put some common sense here, I very much respect your decision / move too and also see the need for Unraid evolving more towards ZOL and possibly replacement layers ...

I personally went the Synology -> Kubernetes + Distributed FS route before going back to OMV and in the end Unraid ...

Enterprise solutions so full of issues flagged as stable that I consider this all a breeze in comparission (yea I know how that sounds, might be very well a user issue too 😅)

Edited January 7 by jit-010101

grants169 · January 7

jit-010101, thank you for that background, that does help explain what's going on even if there isn't a fix.

What version of unRAID do we have to rollback to in order to not encounter the bug? I came to unRAID a couple years ago on 6.9 where I encountered the bug. All of my upgrades became necessary when things (notably plugins) became unsupported. I always waited as long as I could before upgrading became necessary.

robertklep · January 8

19 hours ago, jit-010101 said:

libfuse is basically in life-support-mode - since 2021

Which for me was another reason to move away from Unraid. This is the magic sauce that Unraid is built on, and it's more or less abandonware.

19 hours ago, jit-010101 said:

The good thing with Unraid is that you can easily downgrade - and stay on a version that stays stable for you too.

You dont have to upgrade.

I started using Unraid in October 2022, when 6.11 was just released. The issues I had were present already, although they didn't occur as often as later on (about every two weeks or so). Since I suspected it might have been a hardware issue at first, it took me quite some time to realise it wasn't.

I held off upgrading to 6.12 (from 6.11.5 which I was running) for quite a while until I was fed up with the constant hangs, so I upgraded to 6.12.6 hoping the issue would be fixed, but it actually made matters worse.

I then downgraded back to 6.11.5 (which indeed was very easy!) only to find that by doing so, the community apps plugin was broken because I had mistakenly upgraded it on 6.12.6 and the minimum Unraid version that the installed plugin supported was 6.12.0.

Which is when I decided to migrate away from Unraid. It's great when it works, but during this entire ordeal I found so many implementation details that puzzle me (`in_use` a shell script? really!?) that I'm happy to be running something else now.

Edited January 8 by robertklep

RaidUnnewb · January 12

IS THIS WHY MY SERVER SUCKS??!?!?!!

Ive had constant problems with Unraid since about day 1, meanwhile my buddy hasnt restarted his server in 6 months...

Tonight I finally caught the server right before it went unresponsive. All my shares were gone. Tried disabling docker (cant restart without my appdata share existing...), tried stopping and restarting array, nope.

Checked some settings, I never had NFS enabled, I turned off disk shares to be sure, didnt have any.

Only installed Tdarr a couple days ago, dont have any other Arr's.

This is my second full system, I have been throwing money at this problem for a year now and I am still having trouble staying up long enough for a damn parity check, let alone being able to use the server to do actual work.

Replaced the flash drive early on, replaced RAM early on. Ended up switching out the motherboard/ram (again)/cpu, still happening, replaced the motherboard and PSU again 2 days ago, still happening.... Ive tried Intel and AMD. Ive tried DDR4 and DDR5. Ive tried Asrock, Gigabyte and Asus motherboards.

Rebooting always works, and I limped by with rebooting every week for a while. But I want to start using my server for server things, not just to empty out some drives from my computer. Windows is more stable than this Linux box..?

I cant reboot every night when it takes 3 days to do a parity check...

So far in a year I've probably had to hard reboot (kill power) over 30 times...

Is this an Unraid problem or a Linux problem? Only thing I havent done yet is turn off hardlinks, which everyone says not to do because of trash guides?

Edited January 12 by RaidUnnewb

grants169 · January 12

11 minutes ago, RaidUnnewb said:

Is this an Unraid problem or a Linux problem?

I don't know. But at this point I'd be looking for mental problems.

dlandon · January 12

3 hours ago, RaidUnnewb said:

s this an Unraid problem or a Linux problem? Only thing I havent done yet is turn off hardlinks, which everyone says not to do because of trash guides?

Will your server stay up after booting in the safe mode with no plugins or Docker containers installed?

robertklep · January 12

4 hours ago, RaidUnnewb said:

Is this an Unraid problem or a Linux problem?

For me, it was clearly an Unraid problem.

dlandon · January 12

I'm not seeing a common theme as to why this is happening to some and not others. What would help troubleshoot this is if someone will run the server in safe mode with the syslog server set to save the log on the flash, run it this way for an extended time and see if the issue happens. if it does, provide diagnostics and the log from the flash.

If it doesn't happpen, start adding plugins and docker containers back one at a time and see if one of these causes the issue.

While I appreciate that the consenus seems to be to blame Unraid, a plugin or docker container can cause this issue. Incorrect docker mapping of /mnt/ shares could possibly cause a problem.

Lignumaqua · January 12

Not exactly the answer you asked for @dlandon, but see my post above. I have run this particular server for 7 years with a good number of Dockers including Tdarr, and have never experienced this issue of losing all shares. However, it then occurred immediately and repeatably when trying to run Unbalance on a large folder of files (with the Docker, Seafile, that created those files stopped). Now that's finished and I've moved the data the server has gone back to being stable. So, yes, it might be a Docker but, if it is, it doesn't show the issue until Unbalance is run as well. Very strange...

RaidUnnewb · January 12

5 hours ago, dlandon said:

Will your server stay up after booting in the safe mode with no plugins or Docker containers installed?

We'll see. It lasted a week or so sometime 8 months ago in safe mode before becoming unresponsive again and requiring a hard reset.

Rebooted into safe mode just now, plugins are all gone, docker was running? So disabled docker.

I used to know when my server stopped working because my internet would go out (pihole), quickly got tired of that so bought a raspberry pi and now have that as my primary. So have been unable to catch the server lately in its 'almost dead' phase, instead only when its fully dead and I have to hard reset. In the past 3 weeks I installed most of those dockers, this problem has been happening for a year. Prior to December, I only has pihole/unbound/satisfactory(almost always off) and thats it, everything else is new, and Tdarr specifically seems to kill the server overnight instead of in a few days.

Hard locked again last night, told Tdarr to do some stuff. Its not set to use CPU at all, just GPU.

Something is using my CPU and I dont know what. I set every docker manually in extra parameters to something similar to: "--memory=2g --cpu-shares=256"

This particular hard lock, has activity showing on this page, all of the CPU threads are dancing around a bit, but no buttons work and rest of server is dead. If I were to refresh page, I wouldnt even get this page. Reboot/shutdown/log/terminal buttons dont work of course..

Pihole was working, rest of the dockers were unresponsive.

Yesterday it locked up and I plugged in a monitor (can sometimes tell it to reboot via keyboard/monitor), got this:

907256493_fulllockup.png.80a504eaf6761de44189d39067e54bee.png

Any help is appreciated, including mental

Edited January 12 by RaidUnnewb

JorgeB · January 12

18 minutes ago, RaidUnnewb said:

Hard locked again last night

Hard locking or high CPU usage should not be related to this issue, this only causes the user shares to stop working, may be better to discuss that on a different thread.

RaidUnnewb · January 12

56 minutes ago, JorgeB said:

Hard locking or high CPU usage should not be related to this issue, this only causes the user shares to stop working, may be better to discuss that on a different thread.

Copy that, will make a separate post in a week with results of the current safe mode experiment.
Or want me to do it today so I can copy my last post over and delete it from this thread?

robertklep · January 13

15 hours ago, dlandon said:

If it doesn't happpen, start adding plugins and docker containers back one at a time and see if one of these causes the issue.

That's basically what I did, but never was able to find a specific plugin or container that caused the problem. For me, the best way to trigger the issue was using NFS, but even with NFS completely turned off and using SMB it happened. Which for me invalidated the use of Unraid as a NAS.

15 hours ago, dlandon said:

While I appreciate that the consenus seems to be to blame Unraid, a plugin or docker container can cause this issue. Incorrect docker mapping of /mnt/ shares could possibly cause a problem.

Up until now, nobody has been able to pinpoint the exact cause of this problem though. If a misconfiguration of a Docker container can cause `shfs` to deadlock I would at least expect some documentation on how to prevent this.

[6.8.3] shfs error results in lost /mnt/user

User Feedback

Recommended Comments

robertklep 4

Link to comment

robertklep 4

Link to comment

jackjack 2

Link to comment

robertklep 4

Link to comment

jackjack 2

Link to comment

itimpi 2254

Link to comment

jackjack 2

Link to comment

robertklep 4

Link to comment

Lignumaqua 9

Link to comment

jit-010101 22

Link to comment

robertklep 4

Link to comment

XiMA4 2

Link to comment

jit-010101 22

Link to comment

grants169 6

Link to comment

robertklep 4

Link to comment

RaidUnnewb 3

Link to comment

grants169 6

Link to comment

dlandon 1309

Link to comment

robertklep 4

Link to comment

dlandon 1309

Link to comment

Lignumaqua 9

Link to comment

RaidUnnewb 3

Link to comment

JorgeB 7520

Link to comment

RaidUnnewb 3

Link to comment

robertklep 4

Link to comment

Join the conversation