[6.8.3] shfs error results in lost /mnt/user

grants169 · April 6, 2023

Throwing my hat in here too..

Don't user Tdarr, and stopped using NFS and I *thought* the problem was fixed by using only samba as I haven't had the issue in a couple weeks.. But, here we go again, it just happened.. Although, the error is different, with NFS it was a kernel crash, with samba I got this:

smbd[26612]:   Invalid SMB packet: first request: 0x0001
shfs: shfs: ../lib/fuse.c:1450: unlink_node: Assertion `node->nlookup > 1' failed.
rsyslogd: file '/mnt/user/temp/syslog-192.168.15.21.log'[9] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: Transport endpoint is not connected [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
rsyslogd: file '/mnt/user/temp/syslog-192.168.15.21.log': open error: Transport endpoint is not connected

It happened right in the middle of reading from a samba share on a windows VM running on unraid. Mover was not running and no other shares were active. The windows share was reading from my cache drive, if that makes a difference. Apparently an "invalid SMB packet" caused this? I don't understand how that would happen.

CS01-HS · April 6, 2023

I've triggered it a few times with SMB file operations from my Mac client where the folder/file structure was stale.

Since then I force a refresh by e.g. navigating down a directory then back up before every SMB move or copy. Tedious but so far it hasn't failed.

grants169 · April 6, 2023

1 hour ago, CS01-HS said:

I've triggered it a few times with SMB file operations from my Mac client where the folder/file structure was stale.

Since then I force a refresh by e.g. navigating down a directory then back up before every SMB move or copy. Tedious but so far it hasn't failed.

When it happened to me, windows was installing software from the samba share. The contents of the folder it was installing from was static, nothing to go stale. I've always assumed unRAID was designed to be a NAS as a core competency and it's failing at doing that task and has me questioning my choices. It's rock solid for the most part, but I can't stand random, unknown, and unpredictable crashes with no explanation or potential fix in the pipeline. I'm willing to do whatever it takes to make this problem go away, but so far all I get is "?????????" from the community, unraid, and as the directory structure itself says when listing /mnt/user.

VodkaPump · April 17, 2023

Just had some serious problems with this, sadly forgot to save diagnostics.
I've used Tdarr for years, and never had an issue, and this wasn't related to Tdarr for me either.

For me it was the docker container for a project called Kaizoku that was doing it, I moved my files off the user share and directly to cache and pointed Kaizoku to that instead, and the issue was completely gone.
I would point it back to get it definitely confirmed, but after days of testing the operation that made /mnt/user die I am quite confident.

CS01-HS · April 19, 2023

Happened again, my first time with rc3. Failed creation of a new folder on a share from my Mac (possibly duplicate name) produced this in the log (I use syslog sever.)

Apr 19 08:25:28 NAS emhttpd: read SMART /dev/sdd
Apr 19 08:25:57 NAS emhttpd: read SMART /dev/sde
Apr 19 08:34:54 NAS shfs: shfs: ../lib/fuse.c:1450: unlink_node: Assertion `node->nlookup > 1' failed.
Apr 19 08:34:54 NAS rsyslogd: file '/mnt/user/system/logs/syslog-nas.log'[9] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: Transport endpoint is not connected [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
Apr 19 08:34:54 NAS rsyslogd: file '/mnt/user/system/logs/syslog-nas.log': open error: Transport endpoint is not connected [v8.2102.0 try https://www.rsyslog.com/e/2433 ]
Apr 19 08:34:54 NAS emhttpd: error: get_filesystem_status, 7380: Transport endpoint is not connected (107): scandir Transport endpoint is not connected

Shares inaccessible. Had to stop everything and reboot.

I have to treat every file operation over SMB as though it might take down the array. That's a serious inconvenience.

intoran · May 24, 2023

Same issues here.

CS01-HS · May 24, 2023

6.12 offers a partial solution with cache-only shares bypassing shfs (if you can restructure your workflow to use them.)

evan326 · July 24, 2023

So it looks like this still hasn't been sorted? I'm having this issue occur more and more.

grants169 · July 24, 2023

7 hours ago, evan326 said:

So it looks like this still hasn't been sorted? I'm having this issue occur more and more.

Out of curiosity, are you overclocking and/or have XMP enabled? Out of desperation for a fix, I reset my BIOS to defaults (no OC or XMP) and currently have an uptime of a little over 3 months, where previously 2 weeks was about the maximum before running into this error. No clue if it's related, or coincidence, but I'll take it for now. Still on 6.11.5.

itimpi · July 24, 2023

8 hours ago, evan326 said:

So it looks like this still hasn't been sorted? I'm having this issue occur more and more.

Something must be making the shfs system crash. Have you made sure that there are no RAM related issues and no over-clocking/xmp profiles on the ram.

CS01-HS · July 26, 2023

On 7/24/2023 at 9:44 AM, itimpi said:

Something must be making the shfs system crash.

In my case the half a dozen times it happened were all triggered by SMB operations. Never happened in any other case and since the new implementation where cache-only shares bypass shfs (and the majority of my SMB use is cache-only shares) it hasn't happened since.

evan326 · July 29, 2023

On 7/24/2023 at 8:55 AM, grants169 said:

Out of curiosity, are you overclocking and/or have XMP enabled? Out of desperation for a fix, I reset my BIOS to defaults (no OC or XMP) and currently have an uptime of a little over 3 months, where previously 2 weeks was about the maximum before running into this error. No clue if it's related, or coincidence, but I'll take it for now. Still on 6.11.5.

No OC or xmp. I've run memtest for three days in the past
I can see the shfs system is crashing, that's what I'm looking for help with.

Alex R. Berg · August 27, 2023

I have a theory of what is causing the crash, that I posted here:

It the theory holds up, it also guides us how to avoid triggering the issue.

buwoyouwo · September 13, 2023

The problem happend to me twice in two days. In my case it happens when I'm using Unraid as rsync server and trying to backup data from QNAP to Unraid.

unraid-diagnostics-20230913-1446.zip

Edited September 13, 2023 by buwoyouwo

je82 · September 27, 2023

On 4/6/2023 at 7:06 AM, grants169 said:
Throwing my hat in here too..

Don't user Tdarr, and stopped using NFS and I *thought* the problem was fixed by using only samba as I haven't had the issue in a couple weeks.. But, here we go again, it just happened.. Although, the error is different, with NFS it was a kernel crash, with samba I got this:
smbd[26612]:   Invalid SMB packet: first request: 0x0001
shfs: shfs: ../lib/fuse.c:1450: unlink_node: Assertion `node->nlookup > 1' failed.
rsyslogd: file '/mnt/user/temp/syslog-192.168.15.21.log'[9] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: Transport endpoint is not connected [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
rsyslogd: file '/mnt/user/temp/syslog-192.168.15.21.log': open error: Transport endpoint is not connected
It happened right in the middle of reading from a samba share on a windows VM running on unraid. Mover was not running and no other shares were active. The windows share was reading from my cache drive, if that makes a difference. Apparently an "invalid SMB packet" caused this? I don't understand how that would happen.

You are definitely on to something, this exact same happened to me now and my server is very strable, it has been running uninterrupted for almost a year and suddenly i had this issue occur and therefor i am here in this thread.

I first though it was related to rsync running but it seems my rsync backup that runs daily finished just seconds before this error occured. What you are explaining here is actually something i was doing when this occured, i was doing some stuff with smb to a share on the cache disk on unraid via remote desktop to a VM that is also running on unraid.

I will monitor the situation and hope for the best.

EDIT:

For limetech info:

My setup has:

NFS Disable and Tunable (support Hard Links): 0

So i guess there are more ways of crashing shfs that is not directly related to these settings.

Edited September 27, 2023 by je82

tucansam · October 2, 2023

What's the fix?

itimpi · October 2, 2023

37 minutes ago, tucansam said:

What's the fix?

As far as I know at the moment if this happens you have to reboot to get things back to a working state.

tucansam · October 2, 2023

That's what I've been doing, unfortunately I am not sitting at my server 24/7 to wait for it to happen.....

I see some of the preventatives as discussed above. Is anything consistently working?

DivideBy0 · October 6, 2023

I have the same issue now, almost daily

WackOo · October 30, 2023

Happend to me just now, so chiming in on letting know it's happening. Will check if the auto-reboot will work out.
I have a Dell R730, so letting it restart will create some noise due to the fans 😅

grenskul · November 13, 2023

This just happened to me in Version 6.12.4 .

I found a workaround that doesn't require a reboot which is nice .

But you have to stop your docker daemon (disable docker in the settings).

Solution:
Disable docker . run the command "fusermount -u /mnt/user" . You will now be able to stop the array . Enable docker . Start the array and everything will be back to normal.

bombz · December 12, 2023

UnRaidOS > 6.12.4

Also reporting the concern, and wanted to share:

Seems (for me) to be related to docker when a container is restarted.

shfs: shfs: ../lib/fuse.c:1450: unlink_node: Assertion `node->nlookup > 1' failed.

this would kill all my shares and the restarted docker container in question would not start. Upon reboot the issue still persisted.
My resolution until posting about this concern previously would be to restore the flash backup to get things operational again.

Reading more into it with user posts here, it was recommended to disable NFS shares, which I have done today, found (1X) share had it turned on.

I have NOT changed the following setting YET:

Settings > Global Shares Settings -> Tunable (support Hard Links): no

which was also recommended.

Going to see if disabling NFS shares help with this concern. Perhaps this bug fix will be resolved in upcoming release?
Planning to move to 6.12.6 based on this:

'This release includes bug fixes and an important patch release of OpenZFS. All users are encouraged to upgrade.'

perhaps that will assist with this bug? I am not 100%

I am going on what the wonderful community here has posted about it and crossing my fingers.

Thanks for all the feedback as always everyone!

Edited December 12, 2023 by bombz

XiMA4 · December 14, 2023

On 11/13/2023 at 4:38 PM, grenskul said:

... run the command "fusermount -u /mnt/user" ...

Thank you!
I confirm, it works.

but the command in my case was "fusermount3" (I have 6.12.6), maybe restarting the OS will be faster

I'm so sick of this problem. If I'm not home, all users suffer and wait for my return. In my opinion, this is unacceptable for NAS. I'm seriously thinking about changing unRAID to something stable. Or split my setup into two parts, one unRAID just the file server and the other unRAID for add-ons, docker, etc. But I'm not sure it will work consistently in this case.

I'm very disappointed with version 6.12.x

Edited December 14, 2023 by XiMA4

grants169 · December 14, 2023

2 hours ago, XiMA4 said:

Or split my setup into two parts, one unRAID just the file server and the other unRAID for add-ons, docker, etc.

For things to work on unRAID you need to have an array started. I suppose you don't need to share any files on it, but some array needs to start because that's how unRAID gets paid. Honestly if I was considering to go this route, I'd nix unRAID and install some flavor of Ubuntu and do things differently rather than paying unRAID twice because of a problem with unRAID. Plus increased hardware, electric, and maintenance cost.

My problem with this error stopped entirely after I disabled XMP which basically put my BIOS settings back to default. 6.11 was up for 270 days before I upgraded to 6.12.6 just recently. 2 days after installing 6.12 the server randomly and uncleanly rebooted itself, fingers crossed it was just a one-off event.

XiMA4 · December 14, 2023

41 minutes ago, grants169 said:

Plus increased hardware, electric, and maintenance cost.

My unRAID is running ESXi, so not a problem, I'll just have to take the time to reconfigure.

Before 6.12.x I had a very stable server, I restarted only when I updated OS or hardware, extremely rarely when there was a power outage. After I bought a UPS and upgraded to 6.12.

I guess you are right, it would be smarter to use a file server (OMV for example) and the second part of unRAID.

There are various recommendations to fix the problems (I have another one besides this one), but many of the recommendations boil down to disabling something. Of course, if you disable NFS, and at the same time SMB, you disable XMP, the problem will be solved, but who needs such a NAS, where for stability you need to degrade the performance or even give up the functionality.

I really like unRAID, but it turned out that stability is more important.

[6.8.3] shfs error results in lost /mnt/user

User Feedback

Recommended Comments

grants169 6

Link to comment

CS01-HS 77

Link to comment

grants169 6

Link to comment

VodkaPump 2

Link to comment

CS01-HS 77

Link to comment

intoran 1

Link to comment

CS01-HS 77

Link to comment

evan326 1

Link to comment

grants169 6

Link to comment

itimpi 2246

Link to comment

CS01-HS 77

Link to comment

evan326 1

Link to comment

Alex R. Berg 21

Link to comment

buwoyouwo 0

Link to comment

je82 36

Link to comment

tucansam 18

Link to comment

itimpi 2246

Link to comment

tucansam 18

Link to comment

DivideBy0 88

Link to comment

WackOo 0

Link to comment

grenskul 2

Link to comment

bombz 12

Link to comment

XiMA4 2

Link to comment

grants169 6

Link to comment

XiMA4 2

Link to comment

Join the conversation