• [6.8.3] shfs error results in lost /mnt/user


    JorgeB
    • Minor

    There are several reports in the forums of this shfs error causing /mnt/user to go away:

     

    May 14 14:06:42 Tower shfs: shfs: ../lib/fuse.c:1451: unlink_node: Assertion `node->nlookup > 1' failed.

     

    Rebooting will fix it, until it happens again, I remember seeing at least 5 or 6 different users with the same issue in the last couple of months, it was reported here that it's possibly this issue:

     

    https://github.com/libfuse/libfuse/issues/128

     

    Attached diags from latest occurrence.

     

     

     

    tower-diagnostics-20200514-1444.zip

    • Upvote 3



    User Feedback

    Recommended Comments



    Throwing my hat in here too..

     

    Don't user Tdarr, and stopped using NFS and I *thought* the problem was fixed by using only samba as I haven't had the issue in a couple weeks.. But, here we go again, it just happened..  Although, the error is different, with NFS it was a kernel crash, with samba I got this:

     

    smbd[26612]:   Invalid SMB packet: first request: 0x0001
    shfs: shfs: ../lib/fuse.c:1450: unlink_node: Assertion `node->nlookup > 1' failed.
    rsyslogd: file '/mnt/user/temp/syslog-192.168.15.21.log'[9] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: Transport endpoint is not connected [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
    rsyslogd: file '/mnt/user/temp/syslog-192.168.15.21.log': open error: Transport endpoint is not connected

     

    It happened right in the middle of reading from a samba share on a windows VM running on unraid.  Mover was not running and no other shares were active.  The windows share was reading from my cache drive, if that makes a difference.  Apparently an "invalid SMB packet" caused this?  I don't understand how that would happen.

    • Like 1
    Link to comment

    I've triggered it a few times with SMB file operations from my Mac client where the folder/file structure was stale.

     

    Since then I force a refresh by e.g. navigating down a directory then back up before every SMB move or copy. Tedious but so far it hasn't failed.

    Link to comment
    1 hour ago, CS01-HS said:

    I've triggered it a few times with SMB file operations from my Mac client where the folder/file structure was stale.

     

    Since then I force a refresh by e.g. navigating down a directory then back up before every SMB move or copy. Tedious but so far it hasn't failed.

     

    When it happened to me, windows was installing software from the samba share.  The contents of the folder it was installing from was static, nothing to go stale. I've always assumed unRAID was designed to be a NAS as a core competency and it's failing at doing that task and has me questioning my choices.  It's rock solid for the most part, but I can't stand random, unknown, and unpredictable crashes with no explanation or potential fix in the pipeline.  I'm willing to do whatever it takes to make this problem go away, but so far all I get is "?????????" from the community, unraid, and as the directory structure itself says when listing /mnt/user.

    • Like 1
    • Upvote 1
    Link to comment

    Just had some serious problems with this, sadly forgot to save diagnostics.
    I've used Tdarr for years, and never had an issue, and this wasn't related to Tdarr for me either.

     

    For me it was the docker container for a project called Kaizoku that was doing it, I moved my files off the user share and directly to cache and pointed Kaizoku to that instead, and the issue was completely gone.
    I would point it back to get it definitely confirmed, but after days of testing the operation that made /mnt/user die I am quite confident.

    Link to comment

    Happened again, my first time with rc3. Failed creation of a new folder on a share from my Mac (possibly duplicate name) produced this in the log (I use syslog sever.)

     

    Apr 19 08:25:28 NAS emhttpd: read SMART /dev/sdd
    Apr 19 08:25:57 NAS emhttpd: read SMART /dev/sde
    Apr 19 08:34:54 NAS shfs: shfs: ../lib/fuse.c:1450: unlink_node: Assertion `node->nlookup > 1' failed.
    Apr 19 08:34:54 NAS rsyslogd: file '/mnt/user/system/logs/syslog-nas.log'[9] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: Transport endpoint is not connected [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
    Apr 19 08:34:54 NAS rsyslogd: file '/mnt/user/system/logs/syslog-nas.log': open error: Transport endpoint is not connected [v8.2102.0 try https://www.rsyslog.com/e/2433 ]
    Apr 19 08:34:54 NAS emhttpd: error: get_filesystem_status, 7380: Transport endpoint is not connected (107): scandir Transport endpoint is not connected

     

    Shares inaccessible. Had to stop everything and reboot.

     

    I have to treat every file operation over SMB as though it might take down the array. That's a serious inconvenience.

    • Like 1
    Link to comment

    6.12 offers a partial solution with cache-only shares bypassing shfs (if you can restructure your workflow to use them.)

    Link to comment
    7 hours ago, evan326 said:

    So it looks like this still hasn't been sorted? I'm having this issue occur more and more. 

     

    Out of curiosity, are you overclocking and/or have XMP enabled?  Out of desperation for a fix, I reset my BIOS to defaults (no OC or XMP) and currently have an uptime of a little over 3 months, where previously 2 weeks was about the maximum before running into this error.  No clue if it's related, or coincidence, but I'll take it for now.  Still on 6.11.5.

    Link to comment
    8 hours ago, evan326 said:

    So it looks like this still hasn't been sorted? I'm having this issue occur more and more. 

    Something must be making the shfs system crash.    Have you made sure that there are no RAM related issues and no over-clocking/xmp profiles on the ram.

    Link to comment
    On 7/24/2023 at 9:44 AM, itimpi said:

    Something must be making the shfs system crash.

     

    In my case the half a dozen times it happened were all triggered by SMB operations. Never happened in any other case and since the new implementation where cache-only shares bypass shfs (and the majority of my SMB use is cache-only shares) it hasn't happened since.

    Link to comment
    On 7/24/2023 at 8:55 AM, grants169 said:

     

    Out of curiosity, are you overclocking and/or have XMP enabled?  Out of desperation for a fix, I reset my BIOS to defaults (no OC or XMP) and currently have an uptime of a little over 3 months, where previously 2 weeks was about the maximum before running into this error.  No clue if it's related, or coincidence, but I'll take it for now.  Still on 6.11.5.

    No OC or xmp. I've run memtest for three days in the past
    I can see the shfs system is crashing, that's what I'm looking for help with.

    Link to comment

    I have a theory of what is causing the crash, that I posted here:
     


    It the theory holds up, it also guides us how to avoid triggering the issue.

    Link to comment
    On 4/6/2023 at 7:06 AM, grants169 said:

    Throwing my hat in here too..

     

    Don't user Tdarr, and stopped using NFS and I *thought* the problem was fixed by using only samba as I haven't had the issue in a couple weeks.. But, here we go again, it just happened..  Although, the error is different, with NFS it was a kernel crash, with samba I got this:

     

    smbd[26612]:   Invalid SMB packet: first request: 0x0001
    shfs: shfs: ../lib/fuse.c:1450: unlink_node: Assertion `node->nlookup > 1' failed.
    rsyslogd: file '/mnt/user/temp/syslog-192.168.15.21.log'[9] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: Transport endpoint is not connected [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
    rsyslogd: file '/mnt/user/temp/syslog-192.168.15.21.log': open error: Transport endpoint is not connected

     

    It happened right in the middle of reading from a samba share on a windows VM running on unraid.  Mover was not running and no other shares were active.  The windows share was reading from my cache drive, if that makes a difference.  Apparently an "invalid SMB packet" caused this?  I don't understand how that would happen.

     

    You are definitely on to something, this exact same happened to me now and my server is very strable, it has been running uninterrupted for almost a year and suddenly i had this issue occur and therefor i am here in this thread.

     

    I first though it was related to rsync running but it seems my rsync backup that runs daily finished just seconds before this error occured. What you are explaining here is actually something i was doing when this occured, i was doing some stuff with smb to a share on the cache disk on unraid via remote desktop to a VM that is also running on unraid.

     

    I will monitor the situation and hope for the best.

     

    EDIT:

    For limetech info:

    My setup has:

    NFS Disable and Tunable (support Hard Links): 0

    So i guess there are more ways of crashing shfs that is not directly related to these settings.

    Edited by je82
    Link to comment
    37 minutes ago, tucansam said:

    What's the fix?

    As far as I know at the moment if this happens you have to reboot to get things back to a working state.

    Link to comment

    That's what I've been doing, unfortunately I am not sitting at my server 24/7 to wait for it to happen.....

     

    I see some of the preventatives as discussed above.  Is anything consistently working?

    Link to comment

    Happend to me just now, so chiming in on letting know it's happening. Will check if the auto-reboot will work out.
    I have a Dell R730, so letting it restart will create some noise due to the fans 😅

    Link to comment

    This just happened to me in Version 6.12.4 .

    I found a workaround that doesn't require a reboot which is nice .

    But you have to stop your docker daemon (disable docker in the settings).

    Solution:
    Disable docker . run the command "fusermount -u /mnt/user" . You will now be able to stop the array . Enable docker . Start the array and everything will be back to normal.

    • Upvote 2
    Link to comment

    UnRaidOS > 6.12.4

    Also reporting the concern, and wanted to share:

    Seems (for me) to be related to docker when a container is restarted.

    shfs: shfs: ../lib/fuse.c:1450: unlink_node: Assertion `node->nlookup > 1' failed.

    this would kill all my shares and the restarted docker container in question would not start. Upon reboot the issue still persisted. 
    My resolution until posting about this concern previously would be to restore the flash backup to get things operational again.

     

    Reading more into it with user posts here, it was recommended to disable NFS shares, which I have done today, found (1X) share had it turned on.

    I have NOT changed the following setting YET:
     

    Settings > Global Shares Settings -> Tunable (support Hard Links): no

    which was also recommended. 

    Going to see if disabling NFS shares help with this concern. Perhaps this bug fix will be resolved in upcoming release?
    Planning to move to 6.12.6 based on this:

    'This release includes bug fixes and an important patch release of OpenZFS. All users are encouraged to upgrade.'

    perhaps that will assist with this bug? I am not 100%

    I am going on what the wonderful community here has posted about it and crossing my fingers.

    Thanks for all the feedback as always everyone!

    Edited by bombz
    Link to comment
    On 11/13/2023 at 4:38 PM, grenskul said:

    ... run the command "fusermount -u /mnt/user" ...

    Thank you!
    I confirm, it works.


    but the command in my case was "fusermount3" (I have 6.12.6), maybe restarting the OS will be faster :)

     

    I'm so sick of this problem. If I'm not home, all users suffer and wait for my return. In my opinion, this is unacceptable for NAS. I'm seriously thinking about changing unRAID to something stable. Or split my setup into two parts, one unRAID just the file server and the other unRAID for add-ons, docker, etc. But I'm not sure it will work consistently in this case.

     

    I'm very disappointed with version 6.12.x

     

    Edited by XiMA4
    Link to comment
    2 hours ago, XiMA4 said:

    Or split my setup into two parts, one unRAID just the file server and the other unRAID for add-ons, docker, etc. 

     

     

    For things to work on unRAID you need to have an array started.  I suppose you don't need to share any files on it, but some array needs to start because that's how unRAID gets paid.  Honestly if I was considering to go this route, I'd nix unRAID and install some flavor of Ubuntu and do things differently rather than paying unRAID twice because of a problem with unRAID.  Plus increased hardware, electric, and maintenance cost.

     

    My problem with this error stopped entirely after I disabled XMP which basically put my BIOS settings back to default.  6.11 was up for 270 days before I upgraded to 6.12.6 just recently.  2 days after installing 6.12 the server randomly and uncleanly rebooted itself, fingers crossed it was just a one-off event.

    Link to comment
    41 minutes ago, grants169 said:

     

    Plus increased hardware, electric, and maintenance cost.

     

    My unRAID is running ESXi, so not a problem, I'll just have to take the time to reconfigure.

     

    Before 6.12.x I had a very stable server, I restarted only when I updated OS or hardware, extremely rarely when there was a power outage. After I bought a UPS and upgraded to 6.12. :(

     

    I guess you are right, it would be smarter to use a file server (OMV for example) and the second part of unRAID.

    There are various recommendations to fix the problems (I have another one besides this one), but many of the recommendations boil down to disabling something. Of course, if you disable NFS, and at the same time SMB, you disable XMP, the problem will be solved, but who needs such a NAS, where for stability you need to degrade the performance or even give up the functionality.

     

    I really like unRAID, but it turned out that stability is more important.

     

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.