shfs[12108]: segfault

cyruspy · May 17, 2020

Hello,

I'm moving data from an old linux NAS to unRAID 6.8.3, in the third day got this error:

[Sat May 16 19:48:37 2020] shfs[12108]: segfault at 10 ip 000014bb1a525624 sp 000014badb5bac50 error 4 in libfuse3.so.3.9.0[14bb1a521000+18000]
[Sat May 16 19:48:37 2020] Code: 7d 68 4c 89 ff e8 ec c7 ff ff 8b 85 00 01 00 00 85 c0 0f 85 4e 01 00 00 4c 89 ee 48 89 ef e8 83 d7 ff ff 4c 89 ff 48 8b 40 20 <4c> 8b 68 10 e8 43 c1 ff ff 45 31 c0 48 8d 4c 24 18 31 d2 4c 89 ee
[Sat May 16 19:48:37 2020] ------------[ cut here ]------------
[Sat May 16 19:48:37 2020] nfsd: non-standard errno: -103
[Sat May 16 19:48:37 2020] WARNING: CPU: 7 PID: 29651 at fs/nfsd/nfsproc.c:820 nfserrno+0x47/0x4f [nfsd]
[Sat May 16 19:48:37 2020] Modules linked in: nfsd lockd grace sunrpc xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod bonding pcc_cpufreq virtio_net net_failover i2c_i801 i2c_core mpt3sas ahci failover libahci intel_agp raid_class intel_gtt scsi_transport_sas virtio_scsi virtio_console agpgart button [last unloaded: md_mod]
[Sat May 16 19:48:37 2020] CPU: 7 PID: 29651 Comm: nfsd Not tainted 4.19.107-Unraid #1
[Sat May 16 19:48:37 2020] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[Sat May 16 19:48:37 2020] RIP: 0010:nfserrno+0x47/0x4f [nfsd]
[Sat May 16 19:48:37 2020] Code: ff c0 48 83 f8 22 75 e1 80 3d 9a 06 01 00 00 41 bc 00 00 00 05 75 15 48 c7 c7 e3 c9 25 a0 c6 05 84 06 01 00 01 e8 eb f1 df e0 <0f> 0b 44 89 e0 41 5c c3 48 83 ec 18 31 c9 ba ff 07 00 00 65 48 8b
[Sat May 16 19:48:37 2020] RSP: 0018:ffffc90000c73d20 EFLAGS: 00010282
[Sat May 16 19:48:37 2020] RAX: 0000000000000000 RBX: ffff888176594c08 RCX: 0000000000000007
[Sat May 16 19:48:37 2020] RDX: 00000000000004bd RSI: 0000000000000002 RDI: ffff88817bbd64f0
[Sat May 16 19:48:37 2020] RBP: ffffc90000c73de0 R08: 0000000000000003 R09: 0000000000016700
[Sat May 16 19:48:37 2020] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000005000000
[Sat May 16 19:48:37 2020] R13: 0000000000000011 R14: ffff888176594c08 R15: ffff88800b1a8600
[Sat May 16 19:48:37 2020] FS:  0000000000000000(0000) GS:ffff88817bbc0000(0000) knlGS:0000000000000000
[Sat May 16 19:48:37 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat May 16 19:48:37 2020] CR2: 0000000000000010 CR3: 0000000176728000 CR4: 00000000000006e0
[Sat May 16 19:48:37 2020] Call Trace:
[Sat May 16 19:48:37 2020]  fill_pre_wcc+0x6b/0x155 [nfsd]
[Sat May 16 19:48:37 2020]  nfsd_create+0xd8/0x182 [nfsd]
[Sat May 16 19:48:37 2020]  nfsd3_proc_mkdir+0x9b/0xdf [nfsd]
[Sat May 16 19:48:37 2020]  nfsd_dispatch+0xb2/0x163 [nfsd]
[Sat May 16 19:48:37 2020]  svc_process+0x4fd/0x6b7 [sunrpc]
[Sat May 16 19:48:37 2020]  nfsd+0xea/0x141 [nfsd]
[Sat May 16 19:48:37 2020]  ? nfsd_destroy+0x48/0x48 [nfsd]
[Sat May 16 19:48:37 2020]  kthread+0x10c/0x114
[Sat May 16 19:48:37 2020]  ? kthread_park+0x89/0x89
[Sat May 16 19:48:37 2020]  ret_from_fork+0x35/0x40
[Sat May 16 19:48:37 2020] ---[ end trace 84f742d85a5c98c6 ]---

And clients cannot access the FS anymore. Probably I can stop/start disks, but I'm concerned there's something fishy hidden, any hints?

The disks don't report any error at this point in time (no apparent physical failure):

nas03-diagnostics-20200516-2013.zip

Edited May 17, 2020 by cyruspy

JorgeB · May 17, 2020

You'll likely need to restart the server, segfault appears to be NFS related, so disable if not in use.

cyruspy · May 17, 2020

NFS is my main access protocol. Restarted the VM and it's working again.

Had a remote Nextcloud server accessing a share while it crashed. Disabled cache for its share, at least for the moment (maybe a data mover thing?).

mooky · September 30, 2022

Did you ever figure a fix for this?

I've been running up against this too.

NFS completely crashes, my only options are reboot the server, or a "better" solution is to just stop and restart the array.

If I was to guess, I'd have to say its related to caching as well. I can reproduce the problem at will by copying a few dozen files (a few GB), and then using my Linux desktop file manager (Pop!_OS "files" which I believe is AKA Nautilus) try to delete the files/directories, and hard crash of the NFS. no shares show up after that in GUI or CLI mode. Even if I log into my unRAID box, the shares are borked, again in GUI/web or in CLI mode.

[1818073.719919] shfs[11335]: segfault at 10 ip 0000151ab370c4f1 sp 0000151a73ffeba0 error 4 in libfuse3.so.3.10.5[151ab3708000+19000]
[1818073.719940] Code: e8 44 c9 ff ff 8b 85 00 01 00 00 85 c0 0f 85 b6 01 00 00 4c 89 f6 48 89 ef 31 db e8 19 dc ff ff 4c 89 ef 45 31 ed 48 8b 40 20 <4c> 8b 70 10 e8 86 c2 ff ff 48 8d 4c 24 18 45 31 c0 31 d2 4c 89 f6

shfs[12108]: segfault

Recommended Posts

cyruspy

Link to comment

JorgeB

Link to comment

cyruspy

Link to comment

mooky

Link to comment

Join the conversation