edgedog

/mnt/user dissappeared

edgedog commented on Can0n's report in Stable Releases

Unfortunately 30 minutes after mounting the nfs volume, the nfsd crashed with the same error and the shfs process stopped and removed my /mnt/user mounted filesystem. This was performed on version 6.6.1 with fuse_remember set to 600. I experienced no 'stale file handle' errors nor any other message on the client that indicated a failure until after loss of the mounted nfs volume. Just submitted the bug report via the web gui. Should include the diagnostics.zip.

/mnt/user dissappeared

edgedog commented on Can0n's report in Stable Releases

I've tested 6.6.1 successfully for the past 3 hours with fuse_remember set to 0 without crashes of nfsd or shfs. Unfortunately my client experienced periodic stale file handle error messages. I also experienced these stale file handle error messages in 6.6.0. I'm now going to test with fuse_remember set to 600 (a value which seems to allow my client to not experience stale file handle messages). I'll report back in several hours. Thanks!

/mnt/user dissappeared

edgedog commented on Can0n's report in Stable Releases

For the past 7 hours, I've been running smoothly with fuse_remember set to 0. The nfsd hasn't crashed and neither has the shfs process which means my /mnt/user mount point hasn't disappeared. I was initially experiencing "stale file handle" error messages on my client but I rebuilt my application and everything has been running perfectly for the past 2.5 hours. Crossing my fingers.

/mnt/user dissappeared

edgedog commented on Can0n's report in Stable Releases

@limetech I've done some research on the forums and discovered the fuse_remember parameter can be modified through the webgui in settings/nfs. I tested the following values: -1, 600, 0, and the default value of 330. Both 330 & 0 resulted in stale file handles while I did not experience stale file handles on the client when using -1 & 600. The nfsd crashed on unraid resulting in a crash of the shfs process for /mnt/user when using -1, 600 & 330. I have yet to experience an nfsd crash using 0 but will continue to monitor its progress. No output to the console windows was observed during any of the crashes. Questions: Is it possible to increase the logging or turn on debugging for the shfs process? It crashes right around the time nfsd does but I see no logs about why it crashed even though they're being piped to logger. This wiki manual (https://wiki.unraid.net/UnRAID_Manual_6#NFS_.28Network_File_System.29) says that NFSv4 support has been included in unraid 6. I verified the nfsd version running on unraid 6.6.0 is NFSv3 using rpcinfo. Is this article incorrect or is there a specific way to enable NFSv4 on unraid 6.6.0?

/mnt/user dissappeared

edgedog commented on Can0n's report in Stable Releases

Yes sir. I submitted my non-anonymized diagnostics.zip through the unraid GUI's feedback/bug report feature on 9/20/2018 a little after 11am UTC. I haven't heard from anyone regarding that submission so that was probably the incorrect way to submit it. I'm sorry for my ignorance. If there's a better way to get you the info, please let me know. Thanks for the information about how nfs and shfs works. At the time of the diagnostics.zip, I was booted in safe mode and my unraid vm had 16GB of RAM allocated with 13GB of that available for use. I had subsequently increased the VM RAM to 40GB for test purposes and continued to experience the crashes. I don't believe there's a lack of memory unless nfsd or shfs is unable to acquire available memory for some reason. But I'm definitely willing to test your theory by modifying the remember parameter of the shfs process. Where is the file that I should modify that sets that parameter? I've scoured the filesystem but have been unable to find it. Thanks a bunch for responding!

/mnt/user dissappeared

edgedog commented on Can0n's report in Stable Releases

@limetech @bonienl @jonp Is this issue on your radar and being worked? Is there anything we can provide you to help you do so? Thanks! This is definitely a recurring issue. And unfortunately my application doesn't like use of SMB shares. It seems to be caused by use of NFS shares. 6.6.0 runs fairly stably until I mount a NFS share. After anywhere between 10 minutes to 3 hours, my /mnt/user folder disappears which creates a cascade of chaos. All the shares disappear which in turn breaks the NFS connection and any other application using the shares including the docker containers. I believe there's some sort of memory issue between a shfs process running on the unraid server and the nfsd. I'm unfamiliar with the implementation of shfs that's running on the unraid server and can't find any online documentation to help me troubleshoot further. The process that actually uses the /mnt/user mount point is: /usr/local/sbin/shfs /mnt/user -disks 63 2048000000 -o noatime,big_writes,allow_other -o remember=330 |& logger The proceeding process fails for some reason when the nfsd crashes with the following error: Sep 20 02:40:01 systemname rpcbind[121456]: connect from 10.10.10.18 to getport/addr(nlockmgr) Sep 20 02:45:01 systemname rpcbind[124301]: connect from 10.10.10.18 to getport/addr(nlockmgr) Sep 20 02:48:46 systemname kernel: ------------[ cut here ]------------ Sep 20 02:48:46 systemname kernel: nfsd: non-standard errno: -107 Sep 20 02:48:46 systemname kernel: WARNING: CPU: 1 PID: 3577 at fs/nfsd/nfsproc.c:817 nfserrno+0x44/0x4a [nfsd] Sep 20 02:48:46 systemname kernel: Modules linked in: veth xt_nat macvlan ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs nfsd lockd grace sunrpc md_mod sb_edac kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd isci libsas glue_helper e1000e intel_agp intel_gtt i2c_piix4 ahci intel_rapl_perf vmxnet3 scsi_transport_sas i2c_core ata_piix libahci agpgart button Sep 20 02:48:46 systemname kernel: CPU: 1 PID: 3577 Comm: nfsd Not tainted 4.18.8-unRAID #1 Sep 20 02:48:46 systemname kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 Sep 20 02:48:46 systemname kernel: RIP: 0010:nfserrno+0x44/0x4a [nfsd] Sep 20 02:48:46 systemname kernel: Code: c0 48 83 f8 22 75 e2 80 3d b3 06 01 00 00 bb 00 00 00 05 75 17 89 fe 48 c7 c7 3b ea 27 a0 c6 05 9c 06 01 00 01 e8 8a 9c dd e0 <0f> 0b 89 d8 5b c3 48 83 ec 18 31 c9 ba ff 07 00 00 65 48 8b 04 25 Sep 20 02:48:46 systemname kernel: RSP: 0018:ffffc90002253db8 EFLAGS: 00010286 Sep 20 02:48:46 systemname kernel: RAX: 0000000000000000 RBX: 0000000005000000 RCX: 0000000000000007 Sep 20 02:48:46 systemname kernel: RDX: 0000000000000000 RSI: ffff88042d656470 RDI: ffff88042d656470 Sep 20 02:48:46 systemname kernel: RBP: ffffc90002253e08 R08: 0000000000000003 R09: ffff88043ff05700 Sep 20 02:48:46 systemname kernel: R10: 0000000000000671 R11: 000000000002273c R12: ffff880428387808 Sep 20 02:48:46 systemname kernel: R13: ffff8804086e2a58 R14: 0000000000000001 R15: ffffffffa027e2a0 Sep 20 02:48:46 systemname kernel: FS: 0000000000000000(0000) GS:ffff88042d640000(0000) knlGS:0000000000000000 Sep 20 02:48:46 systemname kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 20 02:48:46 systemname kernel: CR2: 000000c4200d6000 CR3: 0000000001e0a005 CR4: 00000000000606e0 Sep 20 02:48:46 systemname kernel: Call Trace: Sep 20 02:48:46 systemname kernel: nfsd_open+0x15e/0x17c [nfsd] Sep 20 02:48:46 systemname kernel: nfsd_write+0x4c/0xaa [nfsd] Sep 20 02:48:46 systemname kernel: nfsd3_proc_write+0xad/0xdb [nfsd] Sep 20 02:48:46 systemname kernel: nfsd_dispatch+0xb4/0x169 [nfsd] Sep 20 02:48:46 systemname kernel: svc_process+0x4b5/0x666 [sunrpc] Sep 20 02:48:46 systemname kernel: ? nfsd_destroy+0x48/0x48 [nfsd] Sep 20 02:48:46 systemname kernel: nfsd+0xeb/0x142 [nfsd] Sep 20 02:48:46 systemname kernel: kthread+0x10b/0x113 Sep 20 02:48:46 systemname kernel: ? kthread_flush_work_fn+0x9/0x9 Sep 20 02:48:46 systemname kernel: ret_from_fork+0x35/0x40 Sep 20 02:48:46 systemname kernel: ---[ end trace 51a513aa08ead34a ]---

/mnt/user dissappeared

edgedog commented on Can0n's report in Stable Releases

I believe your other issues may stem from the failure of the /mnt/user mount point. As I said earlier, you can unmount the share using: fusermount -uz /mnt/user And remount the system using: /usr/local/sbin/shfs /mnt/user -disks 63 2048000000 -o noatime,big_writes,allow_other -o remember=330 |& logger I don't understand why it's failing. Hopefully someone is able to troubleshoot and resolve that issue. I've had to reboot my system 5 times today. Will probably roll back to 6.5.3 until this is fixed.

/mnt/user dissappeared

edgedog commented on Can0n's report in Stable Releases

FYI, to fix the issue without rebooting the server you can unmount the /mnt/user mount point using: fusermount -uz /mnt/user And remount the system using: /usr/local/sbin/shfs /mnt/user -disks 63 2048000000 -o noatime,big_writes,allow_other -o remember=330 |& logger I'm still unsure why this process fails.

/mnt/user dissappeared

edgedog commented on Can0n's report in Stable Releases

I'm having the same issue. I booted in safe mode. The mount point does not fail as long as I don't mount and utilize the nfs share. Once I've mounted the share, the share fails after about 1-3 hours of use. The process that fails is this: /usr/local/sbin/shfs /mnt/user -disks 63 2048000000 -o noatime,big_writes,allow_other -o remember=330 |& logger The only error I see in syslog is this: Sep 20 18:01:03 systemname root: mover: finished Sep 20 18:05:02 systemname rpcbind[18092]: connect from 10.10.10.18 to getport/addr(nlockmgr) Sep 20 18:10:02 systemname rpcbind[20809]: connect from 10.10.10.18 to getport/addr(nlockmgr) Sep 20 18:15:02 systemname rpcbind[23574]: connect from 10.10.10.18 to getport/addr(nlockmgr) Sep 20 18:20:01 systemname rpcbind[26396]: connect from 10.10.10.18 to getport/addr(nlockmgr) Sep 20 18:21:33 systemname kernel: ------------[ cut here ]------------ Sep 20 18:21:33 systemname kernel: nfsd: non-standard errno: -107 Sep 20 18:21:33 systemname kernel: WARNING: CPU: 0 PID: 3578 at fs/nfsd/nfsproc.c:817 nfserrno+0x44/0x4a [nfsd] Sep 20 18:21:33 systemname kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT ebtable_filter ebtables ip6table_filter ip6_tables veth xt_nat macvlan ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs nfsd lockd grace sunrpc md_mod sb_edac crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd isci libsas glue_helper vmxnet3 intel_rapl_perf e1000e intel_agp scsi_transport_sas i2c_piix4 i2c_core ahci intel_gtt libahci agpgart ata_piix button [last unloaded: kvm] Sep 20 18:21:33 systemname kernel: CPU: 0 PID: 3578 Comm: nfsd Not tainted 4.18.8-unRAID #1 Sep 20 18:21:33 systemname kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 Sep 20 18:21:33 systemname kernel: RIP: 0010:nfserrno+0x44/0x4a [nfsd] Sep 20 18:21:33 systemname kernel: Code: c0 48 83 f8 22 75 e2 80 3d b3 06 01 00 00 bb 00 00 00 05 75 17 89 fe 48 c7 c7 3b ea 24 a0 c6 05 9c 06 01 00 01 e8 8a 9c e0 e0 <0f> 0b 89 d8 5b c3 48 83 ec 18 31 c9 ba ff 07 00 00 65 48 8b 04 25 Sep 20 18:21:33 systemname kernel: RSP: 0018:ffffc90002283db8 EFLAGS: 00010286 Sep 20 18:21:33 systemname kernel: RAX: 0000000000000000 RBX: 0000000005000000 RCX: 0000000000000007 Sep 20 18:21:33 systemname kernel: RDX: 0000000000000000 RSI: ffff88042d616470 RDI: ffff88042d616470 Sep 20 18:21:33 systemname kernel: RBP: ffffc90002283e08 R08: 0000000000000003 R09: ffff88043ff05e00 Sep 20 18:21:33 systemname kernel: R10: 000000000000068e R11: 0000000000022eb4 R12: ffff8804286dac08 Sep 20 18:21:33 systemname kernel: R13: ffff8804085bea58 R14: 0000000000000001 R15: ffffffffa024e2a0 Sep 20 18:21:33 systemname kernel: FS: 0000000000000000(0000) GS:ffff88042d600000(0000) knlGS:0000000000000000 Sep 20 18:21:33 systemname kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 20 18:21:33 systemname kernel: CR2: 000014c2bf3ecf98 CR3: 0000000001e0a005 CR4: 00000000000606f0 Sep 20 18:21:33 systemname kernel: Call Trace: Sep 20 18:21:33 systemname kernel: nfsd_open+0x15e/0x17c [nfsd] Sep 20 18:21:33 systemname kernel: nfsd_write+0x4c/0xaa [nfsd] Sep 20 18:21:33 systemname kernel: nfsd3_proc_write+0xad/0xdb [nfsd] Sep 20 18:21:33 systemname kernel: nfsd_dispatch+0xb4/0x169 [nfsd] Sep 20 18:21:33 systemname kernel: svc_process+0x4b5/0x666 [sunrpc] Sep 20 18:21:33 systemname kernel: ? nfsd_destroy+0x48/0x48 [nfsd] Sep 20 18:21:33 systemname kernel: nfsd+0xeb/0x142 [nfsd] Sep 20 18:21:33 systemname kernel: kthread+0x10b/0x113 Sep 20 18:21:33 systemname kernel: ? kthread_flush_work_fn+0x9/0x9 Sep 20 18:21:33 systemname kernel: ret_from_fork+0x35/0x40 Sep 20 18:21:33 systemname kernel: ---[ end trace e750f2a3f27398a6 ]--- Sep 20 18:21:34 systemname kernel: docker0: port 5(vethb58ec71) entered disabled state Sep 20 18:21:34 systemname kernel: vethde93cc2: renamed from eth0

Unraid OS version 6.6.0-rc4 available

edgedog commented on limetech's report in Prereleases

The cpu bar for dockers can sometimes overlap the text following the bars.

Posts

Joined

Last visited

Recent Profile Visitors

edgedog's Achievements

Noob (1/14)

Reputation

/mnt/user dissappeared

/mnt/user dissappeared

/mnt/user dissappeared

/mnt/user dissappeared

/mnt/user dissappeared

/mnt/user dissappeared

/mnt/user dissappeared

/mnt/user dissappeared

/mnt/user dissappeared

Unraid OS version 6.6.0-rc4 available