• [6.6.0] NFS Kernel crash


    ajeffco
    • Solved Urgent

    Hello,

     

    Running unraid 6.6.0 stable, with a mostly NFS shares server.  NFS appears to be crashing.  Below is the first indication in the log file of a problem.  All clients lock up with "nfs: server tower not responding, timed out" from that point forward.  I have a coworker running unraid who has had the same issue, and while we initially thought it was just NFS, all CIFS AND RSYNC shares become unavailable also when this happens.  When this happens unraid becomes 100% unusable for file operations for any client!

     

    This appears to  have been reported already at [ 6.6.0-RC4 ] NFS CRASHES.  I submitted another since this is 6.6.0 stable.

     

    HOW TO REPRODUCE:  Reboot and just wait.  My coworker has had this happen a few times, this is my first issue.

     

    Sep 26 03:48:41 tower kernel: ------------[ cut here ]------------
    Sep 26 03:48:41 tower kernel: nfsd: non-standard errno: -103
    Sep 26 03:48:41 tower kernel: WARNING: CPU: 2 PID: 12478 at fs/nfsd/nfsproc.c:817 nfserrno+0x44/0x4a [nfsd]
    Sep 26 03:48:41 tower kernel: Modules linked in: md_mod nfsd lockd grace sunrpc bonding mlx4_en mlx4_core igb sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp ast ttm kvm_intel drm_kms_helper kvm drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd agpgart glue_helper intel_cstate intel_uncore ipmi_ssif intel_rapl_perf syscopyarea mpt3sas i2c_i801 i2c_algo_bit i2c_core ahci sysfillrect pcc_cpufreq libahci sysimgblt fb_sys_fops raid_class scsi_transport_sas wmi acpi_power_meter ipmi_si acpi_pad button [last unloaded: md_mod]
    Sep 26 03:48:41 tower kernel: CPU: 2 PID: 12478 Comm: nfsd Not tainted 4.18.8-unRAID #1
    Sep 26 03:48:41 tower kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.0a 02/08/2018
    Sep 26 03:48:41 tower kernel: RIP: 0010:nfserrno+0x44/0x4a [nfsd]
    Sep 26 03:48:41 tower kernel: Code: c0 48 83 f8 22 75 e2 80 3d b3 06 01 00 00 bb 00 00 00 05 75 17 89 fe 48 c7 c7 3b 9a 18 a0 c6 05 9c 06 01 00 01 e8 8a ec ec e0 <0f> 0b 89 d8 5b c3 48 83 ec 18 31 c9 ba ff 07 00 00 65 48 8b 04 25 
    Sep 26 03:48:41 tower kernel: RSP: 0018:ffffc9000c743db8 EFLAGS: 00010286
    Sep 26 03:48:41 tower kernel: RAX: 0000000000000000 RBX: 0000000005000000 RCX: 0000000000000007
    Sep 26 03:48:41 tower kernel: RDX: 0000000000000000 RSI: ffff88087fc96470 RDI: ffff88087fc96470
    Sep 26 03:48:41 tower kernel: RBP: ffffc9000c743e08 R08: 0000000000000003 R09: ffffffff82202400
    Sep 26 03:48:41 tower kernel: R10: 000000000000087f R11: 000000000000a9e4 R12: ffff8802b01ea808
    Sep 26 03:48:41 tower kernel: R13: ffff8807febb2a58 R14: 0000000000000002 R15: ffffffffa01892a0
    Sep 26 03:48:41 tower kernel: FS:  0000000000000000(0000) GS:ffff88087fc80000(0000) knlGS:0000000000000000
    Sep 26 03:48:41 tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Sep 26 03:48:41 tower kernel: CR2: 00001501e0097000 CR3: 0000000001e0a005 CR4: 00000000001606e0
    Sep 26 03:48:41 tower kernel: Call Trace:
    Sep 26 03:48:41 tower kernel: nfsd_open+0x15e/0x17c [nfsd]
    Sep 26 03:48:41 tower kernel: nfsd_write+0x4c/0xaa [nfsd]
    Sep 26 03:48:41 tower kernel: nfsd3_proc_write+0xad/0xdb [nfsd]
    Sep 26 03:48:41 tower kernel: nfsd_dispatch+0xb4/0x169 [nfsd]
    Sep 26 03:48:41 tower kernel: svc_process+0x4b5/0x666 [sunrpc]
    Sep 26 03:48:41 tower kernel: ? nfsd_destroy+0x48/0x48 [nfsd]
    Sep 26 03:48:41 tower kernel: nfsd+0xeb/0x142 [nfsd]
    Sep 26 03:48:41 tower kernel: kthread+0x10b/0x113
    Sep 26 03:48:41 tower kernel: ? kthread_flush_work_fn+0x9/0x9
    Sep 26 03:48:41 tower kernel: ret_from_fork+0x35/0x40
    Sep 26 03:48:41 tower kernel: ---[ end trace 0df913a547279c0d ]---

    tower-diagnostics-20180926-0904.zip




    User Feedback

    Recommended Comments



    Just tried 6.6.3 and I still have problems mounting NFS drives. When I mounted (manually after launch) the "cctv" NFS share, it worked ok, I was able to browse the files and it seemed ok. But then I mounted the second NFS share, "nextcloud" and it went bad, I could not list the "/mnt/disks/" anymore and the restart got stuck for so long that I had to manually press the server reset button.

     

    I'm attaching 2 diagnostics zip files, one after launch into 6.6.3 and before the 2 NFS mounts, and the other after the NFS mounts, the last one should show the problem.

     

    Thanks,

    Jorge

    unraid-diagnostics-6.6.3-after-NFS-mount.zip

    unraid-diagnostics-6.6.3-before-NFS-mount.zip

    Edited by jmonteiro
    Link to comment
    19 hours ago, jmonteiro said:

    Just tried 6.6.3 and I still have problems mounting NFS drives. When I mounted (manually after launch) the "cctv" NFS share, it worked ok, I was able to browse the files and it seemed ok. But then I mounted the second NFS share, "nextcloud" and it went bad, I could not list the "/mnt/disks/" anymore and the restart got stuck for so long that I had to manually press the server reset button.

     

    I'm attaching 2 diagnostics zip files, one after launch into 6.6.3 and before the 2 NFS mounts, and the other after the NFS mounts, the last one should show the problem.

     

    Thanks,

    Jorge

    unraid-diagnostics-6.6.3-after-NFS-mount.zip

    unraid-diagnostics-6.6.3-before-NFS-mount.zip

    This is not the same issue as subject of this topic.  Something to do with UD maybe?

     

    Please repost to General Support or maybe the UD plugin support topic.

    Link to comment

    I also had a problem with NFS on 6.6.3. In network trace I see NFS3ERR_NOENT and NFS3ERR_STALE. After switch back to 6.6.1 no more NFS3ERR_STALE, only few NFS3ERR_NOENT. I asumme that I need to stick to 6.6.1 for a while.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.