• NFS is about useless in Unraid 6.8.0


    FlamongOle
    • Solved Minor

    Every time I move a file from one location to another (from eg. shared scratch location (nvme) to a cached location (mec. disk)) within shared locations in NFS I constantly get stale file handles and the client drops out. This can be critical as some VM's uses the same mounted shares as I have in general bad luck with write permissions with 9p over VM's. But it's quite annoying for my regular workstation as well as I suddenly looses access due to stale handle when a "Mover" has been running.

     

    I dunno if this is only related to cache shares (but it might look like), or if it happens when things have changed on one of the disks in general. The problem did not happen under 6.7.x or earlier.

     

    I have tried fuse_remember values of standard 330 (which I used with success in earlier versions), 900 to just try a higher number, and also -1 as I have plenty of memory - though I'm not sure if I want that kind of cache to last that long. Honestly can't find any proper answer for what this actually does anyway.

     

    It makes Unraid almost entirely unusable for me, and I can't figure out why this suddenly happens. I hope it's something wrong on my end, but I haven't really changed anything the last 4-5 years (even before my Unraid time).

     

    odin-diagnostics-20200101-2012.zip




    User Feedback

    Recommended Comments



    Just add that I need to "sudo umount" the share when the error occurs which is user mountable at client side, and then remount it to get access again (regular user mount). 

    Link to comment

    It looks like you have several issues.

    Disk sdf has a cable or interface problem:

      7 Seek_Error_Rate         POSR--   073   060   030    -    73416400066
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2551 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2551 CDB: opcode=0x88 88 00 00 00 00 00 7d 80 09 00 00 00 00 08 00 00
    Dec 31 08:00:10 Odin kernel: print_req_error: I/O error, dev sdf, sector 2105542912
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2552 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2552 Sense Key : 0x2 [current] 
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2552 ASC=0x4 ASCQ=0x2 
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2552 CDB: opcode=0x88 88 00 00 00 00 00 7d 80 09 08 00 00 00 08 00 00
    Dec 31 08:00:10 Odin kernel: print_req_error: I/O error, dev sdf, sector 2105542920
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2553 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2553 Sense Key : 0x2 [current] 
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2553 ASC=0x4 ASCQ=0x2 
    Dec 31 08:00:10 Odin kernel: sd 10:0:0:0: [sdf] tag#2553 CDB: opcode=0x88 88 00 00 00 00 00 7d 80 09 10 00 00 00 08 00 00
    Dec 31 08:00:10 Odin kernel: print_req_error: I/O error, dev sdf, sector 2105542928

    Your remote nfs share has something going on, but I don't know what:

    Dec 31 16:29:19 Odin rpcbind[17421]: connect from 192.168.0.60 to null()
    Dec 31 16:29:19 Odin rpcbind[17422]: connect from 192.168.0.60 to getport/addr(nfs)
    Dec 31 16:29:19 Odin rpcbind[17423]: connect from 192.168.0.60 to null()
    Dec 31 16:29:19 Odin rpcbind[17424]: connect from 192.168.0.60 to getport/addr(nfs)
    Dec 31 16:29:20 Odin rpcbind[17484]: connect from 192.168.0.60 to null()

    Why don't you use UD to mount your remote NFS shares?

    Link to comment

    disk "sdg" and "sdf" are irrelevant as they are only for local online backup and not really part of the NFS.

     

    .60 machine is a windows 10 with NFS client, but dunno if that is affected as this isn't that much used anyway. Look at .0.40 / .5.40 for the correct connection and NFS shares.

     

    I use both UD and Unraid NFS to mount with own mount options for my set permissions.

    Link to comment

    I didn't see any remote shares mounted with UD.  You do realize UD can mount the remote NFS shares?  I guess my concern is DIY mount options.  Let UD do it.

     

    I've not seen any issues like you describe with remote NFS mounted shares.  Are you using Jumbo frames?

    Link to comment

    I'm not mounting any NFS shares to Unraid, I am using Unraid as a server only. UD only have NFS share enabled on one share "Data"

    Link to comment

    I use 9000 jumbo frames for my 10GbE connection only. Both cards supports 9000 or even above and has worked earlier without issues. Direct connection from Unraid server to client.

    Link to comment

    Client side, only two examples given (but they remain the same). First = Unraid, Second = UD:

    # mount
    192.168.5.10:/mnt/user/scratch on /mnt/scratch type nfs (rw,nosuid,nodev,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=14,retrans=2,sec=sys,mountaddr=192.168.5.10,mountvers=3,mountport=38043,mountproto=tcp,local_lock=none,addr=192.168.5.10,user=ole)
    192.168.5.10:/mnt/disks/Data/private/ole on /mnt/private type nfs (rw,nosuid,nodev,noexec,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=14,retrans=2,sec=sys,mountaddr=192.168.5.10,mountvers=3,mountport=38043,mountproto=tcp,local_lock=none,addr=192.168.5.10,user=ole)

    Server side:

    # exportfs
    /mnt/user/scratch
                    192.168.5.40
    /mnt/disks/Data
                    192.168.0.0/20

     

    Link to comment

    I will try to use my 1GbE connection with 1500 MTU instead, just to check. I have replaced one 10GbE card with another brand for newer PCIE connection recently.

    Link to comment

    Alright, doesn't matter if I use standard 1GbE connection or 10GbE with jumbo frames. It still get stale file handle after the mover has ran after creating just a simple "test" file with nothing inside on a cached share.

    Link to comment

    Jumbo frames have shown to be an issue,  Set the MTUs to defaults.  Don't use Jumbo frames until you have things working.  Then try Jumbo frames.

     

    You should post this as an issue with your setup and not a 6.8 bug.  It's not a NFS issue, it's a network issue

    Link to comment

    The 1GbE uses the default MTU, and there was no change. However, it was working for over a year without problems with jumbo frames on the 10GbE.

     

    I doubt this is a network problem as it happens on two entirely different network connections and subnets.

    Link to comment

    You can't have Jumbo frames anywhere on your network.  They might have worked in the past, but things change in Linux.

     

    Remote mounts depend on a solid reliable network connection,  No one else has reported this problem yet.

    Link to comment

    Must add that the 9000 MTU 10GbE NIC ran entirely their own local network directly connected between two NIC's, and should not even be conflicted within the regular net. But it was tested with 1500 MTU as all other cards.

     

    The network connection here is quite solid overall and never had it drop out.

     

    Also, there was someone mentioning the same problem with 6.8.0rc5 in the forums here as well. He/she "fixed" it by disabling the cache, also not a real fix.

    Link to comment

    Please close this report and open a topic under UD.

     

    Tested NFS with Unraid 6.8.0 and all is working fine.

     

    Also please see the priority definitions (this is not a urgent category)

     

    Edited by bonienl
    Link to comment

    I don't see why this is related to UD at all. UD only share one device "Data", and does NOT mount anything from a remote location. Unraid/UD does not have anything mounted from remote devices at all.

     

    So far this breaks the functionality for me, I see it's correct to use "Urgent"

    Link to comment
    1 hour ago, olehj said:

    So far this breaks the functionality for me

    Functionality is working fine in my tests.

     

    Have you tested while your system is running in safe mode (no plugins)?

    Edited by bonienl
    Link to comment

    It turns out it happens with my Kubuntu 19.10 install, dmesg is filled up with:

    [  307.567757] NFS: server 192.168.5.10 error: fileid changed
                   fsid 0:59: expected fileid 0xfd00000301de1fda, got 0xfd0500006014b341
    [  307.567965] NFS: server 192.168.5.10 error: fileid changed
                   fsid 0:59: expected fileid 0xfd00000301de1fda, got 0xfd0500006014b341

    ..when I create a file on a share, after refreshing the folder I get a stale file handle.

     

    Can access it through Samba without any problems, and it works with NFS in Win10Pro (even if it spits out a load of inputs into syslog in Unraid).

    Link to comment
    1 hour ago, olehj said:

    So far this breaks the functionality for me, I see it's correct to use "Urgent"

    I've mounted remote NFS shares and not had any issues.

     

    While I understand your frustration about this issue, NFS seems to be working fine.  It is probably related to your particular setup.

     

    If it's that big a problem, roll back to 6.7.

    Link to comment

    Alright, I have to ask again, because it is something really strange happening here

     

    The only affected NFS shares is the ones with an enabled "Cache" drive.

     

    Reading/writing directly to:

    • UD devices: no problems.
    • Cache/scratch: no problems.
    • Unraid data with cache: stale handles after a file has been moved (but not always, sometimes it get instant stale instead when a file has been created)
    • Unraid data without cache: no problems (deactivated cache for the ones which had problems).

     

    NFS mount options are identical all over (tried static (my default) and autofs), I even got the same problem with SAMBA, but only when it's mounted via fstab at client side.

     

    I simply can't see why this problem would be on my end. Running without cache drive is -NOT- a solution, just a workaround.

    Edited by olehj
    Added 4th point
    Link to comment
    2 hours ago, olehj said:

    Alright, I have to ask again, because it is something really strange happening here

    There are two NFS-related tunables:

    First on Settings/NFS Settings there is:

    Tunable (fuse_remember): 330

    The value 330 means 5 1/2 minutes.  The Help for this setting explains it.

     

    and new in 6.8 on Settings/Global Share Settings there is:

    Tunable (support Hard Links): Yes

    Change to No, and see if issue still happens.  Hard links in user shares will not be supported but the file handle should stay consistent.

     

    This is happening because, like AFP, NFS is an outdated archaic protocol which relies on being able to associate any file with a number, and looking that file up by the number (as opposed to SMB which is path-based).  This makes it nearly impossible to use with any stacking file system.

    • Like 1
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.