• /mnt/user dissappeared


    Can0n
    • Retest Urgent

    i just got an error on my Plex docker that please make sure the drive is attached.  I found that /mnt/user is not showing in the docker container but when i do an ls -l in the CLI i have some weird permissions screenshots and diagnostics attached

    cli.png

    docker.png

    thor-diagnostics-20180920-1011.zip




    User Feedback

    Recommended Comments



    @limetech

     

    I've done some research on the forums and discovered the fuse_remember parameter can be modified through the webgui in settings/nfs. 

     

    I tested the following values: -1, 600, 0, and the default value of 330.

     

    Both 330 & 0 resulted in stale file handles while I did not experience stale file handles on the client when using -1 & 600.

     

    The nfsd crashed on unraid resulting in a crash of the shfs process for /mnt/user when using -1, 600 & 330. I have yet to experience an nfsd crash using 0 but will continue to monitor its progress. No output to the console windows was observed during any of the crashes.

     

    Questions:

    Is it possible to increase the logging or turn on debugging for the shfs process? It crashes right around the time nfsd does but I see no logs about why it crashed even though they're being piped to logger.

     

    This wiki manual (https://wiki.unraid.net/UnRAID_Manual_6#NFS_.28Network_File_System.29) says that NFSv4 support has been included in unraid 6. I verified the nfsd version running on unraid 6.6.0 is NFSv3 using rpcinfo. Is this article incorrect or is there a specific way to enable NFSv4 on unraid 6.6.0? 

    Link to comment

    For the past 7 hours, I've been running smoothly with fuse_remember set to 0. The nfsd hasn't crashed and neither has the shfs process which means my /mnt/user mount point hasn't disappeared. 

     

    I was initially experiencing "stale file handle" error messages on my client but I rebuilt my application and everything has been running perfectly for the past 2.5 hours.

     

    Crossing my fingers.

    Edited by edgedog
    Link to comment
    8 hours ago, edgedog said:

    Is it possible to increase the logging or turn on debugging for the shfs process? It crashes right around the time nfsd does but I see no logs about why it crashed even though they're being piped to logger.

    Yes, may come to that but log fills up fast.  Please retest with 6.6.1 since there is an update to FUSE which solves a crash, but I don't know if this is the root cause.

     

    8 hours ago, edgedog said:

    This wiki manual (https://wiki.unraid.net/UnRAID_Manual_6#NFS_.28Network_File_System.29) says that NFSv4 support has been included in unraid 6

    The wiki is incorrect.  At this time Unraid OS only supports NFSv3.

    Link to comment

    I've tested 6.6.1 successfully for the past 3 hours with fuse_remember set to 0 without crashes of nfsd or shfs. Unfortunately my client experienced periodic stale file handle error messages. I also experienced these stale file handle error messages in 6.6.0.

    I'm now going to test with fuse_remember set to 600 (a value which seems to allow my client to not experience stale file handle messages). I'll report back in several hours. Thanks!

    Link to comment
    2 minutes ago, edgedog said:

    I've tested 6.6.1 successfully for the past 3 hours with fuse_remember set to 0 without crashes of nfsd or shfs. Unfortunately my client experienced periodic stale file handle error messages. I also experienced these stale file handle error messages in 6.6.0.

    I'm now going to test with fuse_remember set to 600 (a value which seems to allow my client to not experience stale file handle messages). I'll report back in several hours. Thanks!

    Thanks for your report.  In practice you will probably not be able to run with NFS and fuse_remember set to 0 because there will inevitably be 'stale file handle' errors.

    Link to comment

    Unfortunately 30 minutes after mounting the nfs volume, the nfsd crashed with the same error and the shfs process stopped and removed my /mnt/user mounted filesystem. This was performed on version 6.6.1 with fuse_remember set to 600. I experienced no 'stale file handle' errors nor any other message on the client that indicated a failure until after loss of the mounted nfs volume.

     

    Just submitted the bug report via the web gui. Should include the diagnostics.zip.

    Edited by edgedog
    Link to comment

    I have had similiar issues after upgrading to 6.6.0 in that /mnt/user disappeared several times. This is the logfile from last night. It seems it has something to do with the NFSd process. Maybe it will help the team with trouble shooting.

     

    I have reverted back to 6.5.3 in the meantime since I need a stable system to work with.

     

    Sep 28 00:26:36 vulcan kernel: WARNING: CPU: 18 PID: 25036 at fs/nfsd/nfsproc.c:817 nfserrno+0x44/0x4a [nfsd]
    Sep 28 00:26:36 vulcan kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT xt_nat ebtable_filter ebtables veth ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat dm_crypt algif_skcipher af_alg dm_mod dax nfsd lockd grace sunrpc md_mod ipmi_devintf bonding netxen_nic tg3 sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd ipmi_ssif glue_helper intel_cstate i2c_core intel_uncore intel_rapl_perf ahci libahci megaraid_sas wmi acpi_power_meter pcc_cpufreq button ipmi_si [last unloaded: netxen_nic]
    Sep 28 00:26:36 vulcan kernel: CPU: 18 PID: 25036 Comm: nfsd Not tainted 4.18.8-unRAID #1
    Sep 28 00:26:36 vulcan kernel: Hardware name: Dell Inc. PowerEdge R520/051XDX, BIOS 2.5.1 02/08/2018
    Sep 28 00:26:36 vulcan kernel: RIP: 0010:nfserrno+0x44/0x4a [nfsd]
    Sep 28 00:26:36 vulcan kernel: Code: c0 48 83 f8 22 75 e2 80 3d b3 06 01 00 00 bb 00 00 00 05 75 17 89 fe 48 c7 c7 3b 5a 17 a0 c6 05 9c 06 01 00 01 e8 8a 2c ee e0 <0f> 0b 89 d8 5b c3 48 83 ec 18 31 c9 ba ff 07 00 00 65 48 8b 04 25 
    Sep 28 00:26:36 vulcan kernel: RSP: 0018:ffffc90003b9bdc0 EFLAGS: 00010282
    Sep 28 00:26:36 vulcan kernel: RAX: 0000000000000000 RBX: 0000000005000000 RCX: 0000000000000007
    Sep 28 00:26:36 vulcan kernel: RDX: 0000000000000000 RSI: ffff88041fa56470 RDI: ffff88041fa56470
    Sep 28 00:26:36 vulcan kernel: RBP: ffffc90003b9be10 R08: 0000000000000003 R09: ffff88082ff91c00
    Sep 28 00:26:36 vulcan kernel: R10: 0000000000000679 R11: 0000000000021668 R12: ffff88081ac82808
    Sep 28 00:26:36 vulcan kernel: R13: 00000000abf82000 R14: ffff88081ac82968 R15: 0000000000000100
    Sep 28 00:26:36 vulcan kernel: FS:  0000000000000000(0000) GS:ffff88041fa40000(0000) knlGS:0000000000000000
    Sep 28 00:26:36 vulcan kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Sep 28 00:26:36 vulcan kernel: CR2: 00007f5daaef9010 CR3: 0000000001e0a004 CR4: 00000000000626e0
    Sep 28 00:26:36 vulcan kernel: Call Trace:
    Sep 28 00:26:36 vulcan kernel: nfsd_open+0x15e/0x17c [nfsd]
    Sep 28 00:26:36 vulcan kernel: nfsd_read+0x45/0xec [nfsd]
    Sep 28 00:26:36 vulcan kernel: nfsd3_proc_read+0x95/0xda [nfsd]
    Sep 28 00:26:36 vulcan kernel: nfsd_dispatch+0xb4/0x169 [nfsd]
    Sep 28 00:26:36 vulcan kernel: svc_process+0x4b5/0x666 [sunrpc]
    Sep 28 00:26:36 vulcan kernel: ? nfsd_destroy+0x48/0x48 [nfsd]
    Sep 28 00:26:36 vulcan kernel: nfsd+0xeb/0x142 [nfsd]
    Sep 28 00:26:36 vulcan kernel: kthread+0x10b/0x113
    Sep 28 00:26:36 vulcan kernel: ? kthread_flush_work_fn+0x9/0x9
    Sep 28 00:26:36 vulcan kernel: ret_from_fork+0x35/0x40
    Sep 28 00:26:36 vulcan kernel: ---[ end trace 59856c68d508e506 ]---
    Sep 28 00:32:55 vulcan rpc.mountd[25039]: Cannot export /mnt/user/media, possibly unsupported filesystem or fsid= required
    Sep 28 00:40:30 vulcan kernel: perf: interrupt took too long (4923 > 4918), lowering kernel.perf_event_max_sample_rate to 40000
    Sep 28 01:00:16 vulcan crond[2440]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
    Sep 28 02:00:16 vulcan crond[2440]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
    Sep 28 03:00:16 vulcan crond[2440]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
    Sep 28 03:25:02 vulcan rpc.mountd[25039]: Cannot export /mnt/user/Downloads, possibly unsupported filesystem or fsid= required

     

    Link to comment

    i been running 6.6.1 for 48 hours now default fuse setting. Not had the issue yet but only used NFS a few times last night. Will keep eye on it and post diagnostics if it occurs again.

    Link to comment
    On 9/29/2018 at 5:49 AM, nekromantik said:

    i been running 6.6.1 for 48 hours now default fuse setting. Not had the issue yet but only used NFS a few times last night. Will keep eye on it and post diagnostics if it occurs again.

    honestly try coping a fairly large file over NFS it WILL crash

    you can use terminal and go to cd /mnt then ls -l and see the user folder permissions are now corrupted until you reboot...since it happens to me every 2-4 hours with one exception of 11 hours i backdated to 6.5.3

    Link to comment

    Another downgrade here due to disappearing shares.  Was fine before 6.6.0, still busted on 6.6.1.

     

    <snip>

     

    Oct  3 07:33:59 Tower rpcbind[24878]: connect from 192.168.1.196 to getport/addr(mountd)
    Oct  3 07:33:59 Tower rpc.mountd[3364]: authenticated mount request from 192.168.1.196:302 for /mnt/user/isos (/mnt/user/isos)
    Oct  3 07:33:59 Tower rpcbind[24879]: connect from 192.168.1.196 to getport/addr(nfs)
    Oct  3 07:33:59 Tower rpc.mountd[3364]: authenticated mount request from 192.168.1.196:303 for /mnt/user/vmware (/mnt/user/vmware)
    Oct  3 07:50:12 Tower rpcbind[28591]: connect from 192.168.1.138 to getport/addr(555555555)
    Oct  3 08:07:56 Tower rpcbind[3246]: connect from 192.168.1.138 to getport/addr(555555555)
    Oct  3 08:20:12 Tower rpcbind[30660]: connect from 192.168.1.138 to getport/addr(555555555)
    Oct  3 08:20:39 Tower rpcbind[31620]: connect from 192.168.1.191 to getport/addr(555555555)
    Oct  3 08:33:35 Tower kernel: ------------[ cut here ]------------
    Oct  3 08:33:35 Tower kernel: nfsd: non-standard errno: -103
    Oct  3 08:33:35 Tower kernel: WARNING: CPU: 15 PID: 3361 at fs/nfsd/nfsproc.c:817 nfserrno+0x44/0x4a [nfsd]
    Oct  3 08:33:35 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs nfsd lockd grace sunrpc md_mod bonding bnx2 intel_powerclamp coretemp kvm_intel kvm mpt3sas c
    rc32c_intel sr_mod ipmi_ssif i2c_core ata_piix cdrom intel_cstate wmi raid_class intel_uncore aacraid acpi_power_meter i7core_edac scsi_transport_sas button pcc_cpufreq ipmi_si acpi_cpufreq [last unloaded: bnx2]
    Oct  3 08:33:35 Tower kernel: CPU: 15 PID: 3361 Comm: nfsd Tainted: G        W I       4.18.10-unRAID #2
    Oct  3 08:33:35 Tower kernel: Hardware name: Dell Inc. PowerEdge R710/00W9X3, BIOS 6.4.0 07/23/2013
    Oct  3 08:33:35 Tower kernel: RIP: 0010:nfserrno+0x44/0x4a [nfsd]
    Oct  3 08:33:35 Tower kernel: Code: c0 48 83 f8 22 75 e2 80 3d b3 06 01 00 00 bb 00 00 00 05 75 17 89 fe 48 c7 c7 3b aa 11 a0 c6 05 9c 06 01 00 01 e8 3b dd f3 e0 <0f> 0b 89 d8 5b c3 48 83 ec 18 31 c9 ba ff 07 00 00 65 48 8b 04 25
    Oct  3 08:33:35 Tower kernel: RSP: 0018:ffffc90001fabdc0 EFLAGS: 00010282
    Oct  3 08:33:35 Tower kernel: RAX: 0000000000000000 RBX: 0000000005000000 RCX: 0000000000000007
    Oct  3 08:33:35 Tower kernel: RDX: 0000000000000000 RSI: ffff880227dd6470 RDI: ffff880227dd6470
    Oct  3 08:33:35 Tower kernel: RBP: ffffc90001fabe10 R08: 0000000000000003 R09: ffffffff821f8900
    Oct  3 08:33:35 Tower kernel: R10: 000000000001ce3d R11: 0000000000000ec4 R12: ffff880211078c08
    Oct  3 08:33:35 Tower kernel: R13: 000000004eca0000 R14: ffff880211078d68 R15: 0000000000000100
    Oct  3 08:33:35 Tower kernel: FS:  0000000000000000(0000) GS:ffff880227dc0000(0000) knlGS:0000000000000000
    Oct  3 08:33:35 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Oct  3 08:33:35 Tower kernel: CR2: 000000c420649010 CR3: 0000000001e0a000 CR4: 00000000000006e0
    Oct  3 08:33:35 Tower kernel: Call Trace:
    Oct  3 08:33:35 Tower kernel: nfsd_open+0x15e/0x17c [nfsd]
    Oct  3 08:33:35 Tower kernel: nfsd_read+0x45/0xec [nfsd]
    Oct  3 08:33:35 Tower kernel: nfsd3_proc_read+0x95/0xda [nfsd]
    Oct  3 08:33:35 Tower kernel: nfsd_dispatch+0xb4/0x169 [nfsd]
    Oct  3 08:33:35 Tower kernel: svc_process+0x4b5/0x666 [sunrpc]
    Oct  3 08:33:35 Tower kernel: ? nfsd_destroy+0x48/0x48 [nfsd]
    Oct  3 08:33:35 Tower kernel: nfsd+0xeb/0x142 [nfsd]
    Oct  3 08:33:35 Tower kernel: kthread+0x10b/0x113
    Oct  3 08:33:35 Tower kernel: ? kthread_flush_work_fn+0x9/0x9
    Oct  3 08:33:35 Tower kernel: ret_from_fork+0x35/0x40
    Oct  3 08:33:35 Tower kernel: ---[ end trace 333da73c724f6cb1 ]---
    Oct  3 08:34:23 Tower rpc.mountd[3364]: Cannot export /mnt/user/vmware, possibly unsupported filesystem or fsid= required
    Oct  3 08:34:23 Tower rpc.mountd[3364]: Cannot export /mnt/user/isos, possibly unsupported filesystem or fsid= required
    Oct  3 08:41:28 Tower rpc.mountd[3364]: Cannot export /mnt/user/movies, possibly unsupported filesystem or fsid= required

    </snip>

    Link to comment

    Throwing another one in the ring here...

     

    Unraid 6.6.1, has been running 6.5.3 without problems (that's a lie, but unrelated problems) and now seeing the same - the /mnt/user share disappearing over NFS and, as such, VM's and Dockers dying very quickly. It has happened twice since upgrading to 6.6.1.

     

    Attached are diagnostics during a 'crash'.

     

    I haven't tried manually remounting the shares, but a reboot has solved it in both cases. I've set the fuse_remember parameter to -1 because I had a problem where I had stale file handlers in one of the VM's (which has happened a few times, too).

    unraid-diagnostics-20181008-1813.zip

    Link to comment
    7 hours ago, MarkUK said:

    (The share I was copying to is not exported at all over NFS or AFP)

    Crash was still NFS related, so same issue, likely just having NFS exports enable can cause the problem.

    Link to comment

    Kernel updated but nothing obviously patched that would explain this.  We did add more instrumentation into FUSE, would appreciate a retest and repost of diagnostics.zip upon failure.

    Link to comment

    Wanted to add that I haven't seen this behaviour since my last post (21 days ago). Hopefully resolved, at least from my end! :) Cheers

    Link to comment
    6 hours ago, MarkUK said:

    Wanted to add that I haven't seen this behaviour since my last post (21 days ago). Hopefully resolved, at least from my end! :) Cheers

    Thank you for the update!

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.