crazytony Posted May 12, 2012 Share Posted May 12, 2012 2) When writing to unRAID, nfs can report a 'stale nfs handle'. I think that rsync does not generate the conditions which provoke this error. I completely forgot about this. I switched all my connectivity to SMB because nfs was so unreliable in the previous betas. Just testing now and I'm still seeing the problem though it's not as severe as it once was. If I create a new file in an existing directory then doing an ls causes the stale nfs mount message. Moving back then forwards again corrects the issue. Previously it always gave the stale nfs message. Quote Link to comment
bbqninja Posted May 22, 2012 Share Posted May 22, 2012 regarding the "stale nfs file handle" issue, is it perhaps that it needs an fsid per export? Since we don't really have a block device, but a crazy virtual filesystem, for user shares this makes sense. Quote Link to comment
PeterB Posted May 24, 2012 Share Posted May 24, 2012 Copied from RC2 thread: I don't know whether this provides any clues, but here is a sequence of events on my Ubuntu desktop machine, after I encountered a 'Stale nfs file handle': While I was browsing around my Photos share, Nautilus became unresponsive (but there was no error message). I used telnet to connect to my unRAID server and performed 'ls -l' on both the Photos directory and the 'user' parent: root@Tower:~# ls -l /mnt/user/Photos total 161248 drwxrwx--- 1 nobody users 4360 2011-06-20 01:11 100OLYMP1/ drwxrwx--- 1 nobody users 496 2003-10-26 15:42 100OLYMP2/ drwxrwx--- 1 nobody users 120 2012-01-27 18:00 101029/ drwxr-xr-x 1 nobody users 72 2012-01-27 14:00 110324/ drwxr-xr-x 1 nobody users 72 2012-01-27 14:00 110830/ drwxr-xr-x 1 nobody users 72 2012-01-27 14:00 111004/ drwxr-xr-x 1 nobody users 72 2012-01-27 14:00 111007/ drwxr-xr-x 1 nobody users 336 2011-10-17 20:07 111007YNL/ drwxrwxr-x 1 nobody users 592 2011-10-25 00:29 111022Baking/ drwxrwxr-x 1 nobody users 96 2012-01-27 18:00 111106/ drwxrwxr-x 1 nobody users 136 2012-01-27 14:00 111107_MYF_CrocPark/ drwxrwxr-x 1 nobody users 104 2012-01-27 14:00 111209_Bukidnon/ drwxrwxr-x 1 nobody users 136 2012-01-27 18:00 120108_CDO/ drwxrwxr-x 1 nobody users 48 2012-05-07 18:20 120507/ drwx------ 1 nobody users 720 2012-01-27 14:00 Ai-Ai\ Graduation/ drwxr-xr-x 1 nobody users 2224 2012-01-27 14:00 Import/ drwxr-xr-x 1 nobody users 240 2012-01-27 14:00 Methodist\ Youth\ Surigao/ drwxr-xr-x 1 nobody users 208 2011-06-20 01:11 Ruby\ in\ UK/ -rw-r--r-- 1 nobody users 3072000 2012-01-27 16:06 digikam4.db -rw-r--r-- 1 nobody users 161874944 2012-01-27 16:06 thumbnails-digikam.db root@Tower:~# ls -l /mnt/user total 13024031 drwxr-xr-x 1 nobody users 48 2011-10-17 19:24 111007YNL/ drwxrwx--- 1 nobody users 424 2010-09-08 21:42 Athlon/ drwxr-xr-x 1 nobody users 20352 2011-12-08 08:30 Downloaded\ Files/ -rw-rw---- 1 nobody users 6542697514 2010-02-05 18:20 LoveStory_DVD.mkv drwxrwx--- 1 nobody users 128 2012-03-25 08:01 Maildir/ drwxrwx--- 1 nobody users 6912 2012-05-06 16:14 Movies/ drwxrwx--- 1 nobody users 384 2012-04-08 18:00 Music/ -rw-rw---- 1 nobody users 2088899096 2010-02-07 01:20 NOTTING\ HILL.mkv -rw-rw---- 1 nobody users 4691562496 2010-06-13 21:05 National\ Treasure\ 2004\ 720p.avi drwxrwxr-x 1 nobody users 504 2011-12-27 08:50 Pete's\ N97/ drwxrwxr-x 1 nobody users 72 2012-05-07 18:20 Photos/ drwxrwx--- 1 nobody users 296 2011-09-14 08:08 Series/ drwxrwx--- 1 nobody users 176 2012-04-08 07:55 Squeeze/ drwxr-xr-x 1 logitechmediaserver users 1392 2012-04-08 16:23 Squeeze-7.7.2/ drwxr-xr-x 1 root root 480 2012-03-30 12:32 Temporary/ drwxrwxr-x 1 nobody users 520 2011-11-12 19:58 UMC/ drwxrwx--- 1 nobody users 600 2010-09-04 20:22 UMC2.07/ drwxrwx--- 1 nobody users 600 2010-10-16 09:13 UMC2.08.1/ drwxrwxrwx 1 nobody users 600 2011-06-07 10:42 UMC2.08.1x/ drwxr-xr-x 1 nobody users 496 2011-09-13 08:41 UMC2.10/ drwxrwxr-x 1 nobody users 584 2012-02-15 22:54 UMC2.11/ drwxrwx--- 1 nobody users 696 2011-06-06 08:34 UMCold/ drwxrwx--- 1 nobody users 504 2012-01-02 09:01 Videos/ drwxrwxr-x 1 nobody users 168 2011-11-14 10:32 Wii/ drwxrwx--- 1 nobody users 576 2011-10-18 13:54 Work/ drwxr-xr-x 1 root root 504 2012-04-08 22:45 XFarG7/ drwxrwx--- 1 nobody users 72 2010-10-29 02:18 ZA30/ drwxrwx--- 1 nobody users 864 2010-10-29 01:21 ZK10/ drwxrwx--- 1 nobody users 120 2010-10-25 15:36 ZVideo/ drwxrwx--- 1 nobody users 184 2010-10-29 02:14 ZVideo2/ drwxrwx--- 1 nobody users 72 2010-10-19 23:43 mediaserver/ drwxrwx--- 1 nobody users 112 2011-02-26 13:12 mp3/ -rw-r--r-- 1 root root 18020 2012-03-30 12:32 nolimetangere.odt -rw-r--r-- 1 root root 342926 2012-03-30 12:30 output.pdf -rw-r--r-- 1 nobody users 30312 2012-01-11 00:35 pro.odt drwxrwx--- 1 nobody users 80 2011-09-15 07:19 series/ root@Tower:~# I then looked at the same directories from Ubuntu: peter@desktop:~$ ls -l /net/tower/mnt/user ls: cannot access /net/tower/mnt/user/Photos: Stale NFS file handle total 0 drwxr-xr-x 2 root root 0 May 7 18:19 Movies drwxr-xr-x 2 root root 0 May 7 18:19 Music d??? ? ? ? ? ? Photos drwxr-xr-x 2 root root 0 May 7 18:19 series drwxr-xr-x 2 root root 0 May 7 18:19 Series drwxr-xr-x 2 root root 0 May 7 18:19 UMC drwxr-xr-x 2 root root 0 May 7 18:19 Videos peter@desktop:~$ ls -l /net/tower/mnt/user/Photos ls: cannot access /net/tower/mnt/user/Photos: Stale NFS file handle peter@desktop:~$ sudo umount -f /net/tower/mnt/user/Photos [sudo] password for peter: peter@desktop:~$ ls -l /net/tower/mnt/user total 0 drwxr-xr-x 2 root root 0 May 7 18:19 Movies drwxr-xr-x 2 root root 0 May 7 18:19 Music drwxrwxr-x 1 99 users 72 May 7 18:20 Photos drwxr-xr-x 2 root root 0 May 7 18:19 series drwxr-xr-x 2 root root 0 May 7 18:19 Series drwxr-xr-x 2 root root 0 May 7 18:19 UMC drwxr-xr-x 2 root root 0 May 7 18:19 Videos peter@desktop:~$ ls -l /net/tower/mnt/user total 8 drwxrwx--- 1 99 users 6912 May 6 16:14 Movies drwxrwx--- 1 99 users 384 Apr 8 18:00 Music drwxrwxr-x 1 99 users 72 May 7 18:20 Photos drwxrwx--- 1 99 users 296 Sep 14 2011 series drwxr-xr-x 2 root root 0 May 7 18:19 Series drwxrwxr-x 1 99 users 520 Nov 12 19:58 UMC drwxrwx--- 1 99 users 504 Jan 2 09:01 Videos peter@desktop:~$ I use autofs to mount nfs shares automatically, hence I don't have to issue the mount command. Between the last two 'ls -l /net/tower/mnt/user' I had opened the Photos share in Nautilus - note that ownership of most folders has changed from 'root' to '99'. Here is the line showing details of the 'Photos' share from the output of 'mount' from Ubuntu. tower:/mnt/user/Photos on /net/tower/mnt/user/Photos type nfs (rw,nosuid,nodev,vers=3,hard,intr,nolock,udp,sloppy,addr=10.2.0.100) and I believe that the stale file handle problem still exists (copying to unRAID) Edit: Yes, I can confirm that the stale file handle problem still persists. @Tom Research on the net suggests that the reason for a stale file handle is that the contents of a directory has been changed without the modification time of the directory itself having been updated. Is it possible that this might occur in unRAID - perhaps when a cache drive is in use? Quote Link to comment
crazytony Posted May 24, 2012 Author Share Posted May 24, 2012 I don't have a cache drive so it can't be caused by the cache drive. It could be made worse by a cache drive tho. I have a feeling it's a glitch in the way the user mount driver handles file listings (the way I understand this works is a memory map of which files exist on which drives). Perhaps the driver needs to update the timestamp or possibly not update the timestamp. Quote Link to comment
limetech Posted May 25, 2012 Share Posted May 25, 2012 PeterB- Something useful to know would be whether this happens only with user shares. Or put another way, do you ever see stale file handle issue with disk shares? Quote Link to comment
PeterB Posted May 26, 2012 Share Posted May 26, 2012 I have never seen it happen on a disk share, but I spend more time writing large file to user shares! The problem is not reproducible at will (unlike the hanging nfs reads), but I will try some repeated writes to a disk share over the weekend. Quote Link to comment
dgaschk Posted May 26, 2012 Share Posted May 26, 2012 There have been reports of directory timestamps not properly updating in user shares. Just remembered the context: A Plex or XMBC user was having a problem with the scraper ignoring new file adds because a directory timestamp was not being updated. The user share spanned multiple disks. Quote Link to comment
PeterB Posted May 26, 2012 Share Posted May 26, 2012 There have been reports of directory timestamps not properly updating in user shares. Just remembered the context: A Plex or XMBC user was having a problem with the scraper ignoring new file adds because a directory timestamp was not being updated. The user share spanned multiple disks. If that is happening, then it would explain exactly why we get stale handles. Quote Link to comment
boof Posted May 27, 2012 Share Posted May 27, 2012 As another datapoint in support of a problem with nfs + user shares... I use nfs directly (heavily) to the cache drives direct export. No problem at all. I also more sparingly use nfs to all my user shares. These are what are giving problems. This occured using ubuntu as a client and symptoms similar to what everyone else is describing. I gave it another go this weekend with a different client - Arch Linux running kernel 3.3.7 and still had problems with stale file handles. Kernel logs also issued : [ 723.909179] NFS: server 192.168.1.150 error: fileid changed [ 723.909180] fsid 0:14: expected fileid 0x10000018, got 0x5eb Which supports the 'files on user shares changing underneath nfs' feet' theory. I don't know if newer kernels are just less tolerant of this / why it was never a problem in the past. Quote Link to comment
limetech Posted May 27, 2012 Share Posted May 27, 2012 As another datapoint in support of a problem with nfs + user shares... I use nfs directly (heavily) to the cache drives direct export. No problem at all. I also more sparingly use nfs to all my user shares. These are what are giving problems. This occured using ubuntu as a client and symptoms similar to what everyone else is describing. I gave it another go this weekend with a different client - Arch Linux running kernel 3.3.7 and still had problems with stale file handles. Kernel logs also issued : [ 723.909179] NFS: server 192.168.1.150 error: fileid changed [ 723.909180] fsid 0:14: expected fileid 0x10000018, got 0x5eb Which supports the 'files on user shares changing underneath nfs' feet' theory. I don't know if newer kernels are just less tolerant of this / why it was never a problem in the past. I think I have a fix for this in next release Quote Link to comment
PeterB Posted May 28, 2012 Share Posted May 28, 2012 I think I have a fix for this in next release Woo-hoo! Really looking forward to this! Quote Link to comment
PeterB Posted June 7, 2012 Share Posted June 7, 2012 I've spent quite a bit of time just copying files to the server, as a test. Past experience suggests that I might have seen 'stale file handle' three or four times with RC3, or earlier. With RC4, everything appears to be behaving as it should. It's looking extremely promising! Edit: After some further testing, I am reasonably confident to mark this one as solved ... tentatively. If I experience any more 'stale file handles' I will re-open. Quote Link to comment
boof Posted June 7, 2012 Share Posted June 7, 2012 I'm going to update to rc[4|5] over the weekend - I was looking forward to this fix too. It's good to hear you've found it much improved - thanks for the info I'm much more confident about doing the update now! Quote Link to comment
PeterB Posted June 7, 2012 Share Posted June 7, 2012 I'm going to update to rc[4|5] over the weekend - I was looking forward to this fix too. It's good to hear you've found it much improved - thanks for the info I'm much more confident about doing the update now! For me, rc4 seems to be pretty well perfect. I may try reverting to the tcp transport for for nfs, but I don't see any change which would be expected to fix this so I still expect to see client hangs. Quote Link to comment
Videodude Posted June 11, 2012 Share Posted June 11, 2012 After some further testing, I am reasonably confident to mark this one as solved ... tentatively. If I experience any more 'stale file handles' I will re-open. I recently upgraded from 5.0b14 to 5.0rc4, and I am still having Stale NFS Handle errors when connecting to my WD Live box. This has been reported a number of times in the past, and the only solution has been to either switch to 4.7, or switch to SMB connections. http://lime-technology.com/forum/index.php?topic=17679.0 My current workaround is a little jenky - feed an SMB connection to a Windows 7 box, and rewrap that as NFS using HaneWin NFS Server. It looks like I still need to do this workaround with RC4. Quote Link to comment
PeterB Posted June 11, 2012 Share Posted June 11, 2012 ... I am still having Stale NFS Handle errors when connecting to my WD Live box. to the WD Live box, or from the WD Live box? Quote Link to comment
Videodude Posted June 11, 2012 Share Posted June 11, 2012 ... I am still having Stale NFS Handle errors when connecting to my WD Live box. to the WD Live box, or from the WD Live box? Sorry, you are absolutely correct. I am getting those errors when connecting from the WD Live box to my NFS user share on my unRAID system. I disabled all plugins on my unRAID system this morning, and I am attempting to recreate the issue. This NFS error sometimes happens almost immediately, while other times takes hours. As soon as I have recreated the error, I will be posting the appropriate unRAID system logs. Quote Link to comment
Videodude Posted June 11, 2012 Share Posted June 11, 2012 I just got the Stale NFS Handle error again. I believe this happened around 11am. The only thing in my system logs that correlates around that time are the spindown 4 & spindown 5 events at 10:58am. I should note that when I was using 5.0b14, I disabled disk spin down, and this error still occurred. Here is the error on my WD Live Box: # ls -al ls: ./zTemp: Stale NFS file handle ls: ./Videos: Stale NFS file handle drwxr-xr-x 7 root root 140 Dec 31 16:45 . drwxr-xr-x 3 root root 60 Dec 31 16:00 .. drwxrwxrwx 1 nobody 100 416 Jun 9 2012 .HD_Videos d-wxr----t 3 root root 60 Dec 31 16:45 .wd_tv drwxr-xr-x 3 root root 60 Dec 31 16:00 USB2 Here is my unRAID system log for the past ~2 hours Jun 11 09:06:27 MrTower emhttp: Start NFS... Jun 11 09:06:27 MrTower emhttp: shcmd (51): /etc/rc.d/rc.nfsd start |& logger Jun 11 09:06:27 MrTower logger: Starting NFS server daemons: Jun 11 09:06:27 MrTower logger: /usr/sbin/exportfs -r Jun 11 09:06:27 MrTower logger: /usr/sbin/rpc.nfsd 8 Jun 11 09:06:27 MrTower logger: /usr/sbin/rpc.mountd Jun 11 09:06:27 MrTower mountd[1445]: Kernel does not have pseudo root support. Jun 11 09:06:27 MrTower mountd[1445]: NFS v4 mounts will be disabled unless fsid=0 Jun 11 09:06:27 MrTower mountd[1445]: is specfied in /etc/exports file. Jun 11 09:06:27 MrTower emhttp: shcmd (52): /usr/local/sbin/emhttp_event svcs_restarted Jun 11 09:06:27 MrTower emhttp_event: svcs_restarted Jun 11 09:08:54 MrTower mountd[1446]: authenticated mount request from 172.16.0.92:1001 for /mnt/user/Media/_isVideo/HD/.Playlists/Playlists (/mnt/user/Media) Jun 11 09:08:57 MrTower mountd[1446]: authenticated mount request from 172.16.0.92:716 for /mnt/user/Media/_isVideo/HD (/mnt/user/Media) Jun 11 09:09:00 MrTower mountd[1446]: authenticated mount request from 172.16.0.92:852 for /mnt/user/Media/_isVideo/zTemp (/mnt/user/Media) Jun 11 09:21:58 MrTower kernel: mdcmd (39): spindown 0 Jun 11 09:23:59 MrTower kernel: mdcmd (40): spindown 1 Jun 11 09:24:00 MrTower kernel: mdcmd (41): spindown 3 Jun 11 09:24:02 MrTower kernel: mdcmd (42): spindown 4 Jun 11 09:24:05 MrTower kernel: mdcmd (43): spindown 5 Jun 11 09:24:08 MrTower kernel: mdcmd (44): spindown 2 Jun 11 09:54:45 MrTower in.telnetd[3637]: connect from 172.16.0.150 (172.16.0.150) Jun 11 09:54:51 MrTower login[3638]: ROOT LOGIN on '/dev/pts/0' from '172.16.0.150' Jun 11 10:07:19 MrTower kernel: mdcmd (45): spindown 4 Jun 11 10:07:21 MrTower kernel: mdcmd (46): spindown 5 Jun 11 10:08:14 MrTower kernel: mdcmd (47): spindown 1 Jun 11 10:08:15 MrTower kernel: mdcmd (48): spindown 3 Jun 11 10:58:38 MrTower kernel: mdcmd (49): spindown 4 Jun 11 10:58:50 MrTower kernel: mdcmd (50): spindown 5 Quote Link to comment
Videodude Posted June 11, 2012 Share Posted June 11, 2012 I'm looking at the unRAID Processes log, and I noticed the following: root 1437 2 0 09:06 ? 00:00:01 [nfsd] root 1438 2 0 09:06 ? 00:00:01 [nfsd] root 1439 2 0 09:06 ? 00:00:02 [nfsd] root 1440 2 0 09:06 ? 00:00:01 [nfsd] root 1441 2 0 09:06 ? 00:00:02 [nfsd] root 1442 2 0 09:06 ? 00:00:01 [nfsd] root 1443 2 0 09:06 ? 00:00:01 [nfsd] root 1444 2 0 09:06 ? 00:00:01 [nfsd] root 1446 1 0 09:06 ? 00:00:00 /usr/sbin/rpc.mountd nobody 2221 1314 1 09:24 ? 00:01:24 /usr/sbin/smbd -D root 6259 2 0 10:55 ? 00:00:00 [kworker/0:0] root 6260 2 0 10:55 ? 00:00:00 [flush-9:1] I'm not certain if this is important information, but I do notice some processes logged at 10:55am, which is right around when I think the Stale NFS File Handle error occurred. Quote Link to comment
PeterB Posted June 12, 2012 Share Posted June 12, 2012 Okay, I've just had a number of 'stale file handle' errors but, admittedly, while running an application (digiKam) which claims that it cannot access data on network shares. However, I discover that if I disable the use of the cache drive for the user share in question, then I have no more problems. I now believe that there were two stale file handle faults - the symptoms of this one were a little different to the 'other' stale handle problem which I still believe to have been resolved. Quote Link to comment
Videodude Posted June 12, 2012 Share Posted June 12, 2012 However, I discover that if I disable the use of the cache drive for the user share in question, then I have no more problems. In my user share that I get the Stale File Handle errors on, I have the cache drive disabled. Quote Link to comment
crazytony Posted June 13, 2012 Author Share Posted June 13, 2012 Just had a chance to install rc4. Looks good from my end -- same scenario as before (new dir in an existing dir, etc) plus a couple more. Not a whimper. Next step is to move xbmc back to NFS. Quote Link to comment
PeterB Posted June 13, 2012 Share Posted June 13, 2012 Thinking about this problem, and trying to rationalise it against the "directory changing without a time stamp change" theory, I can see why there might be difficulties. When a user share is split across different physical disks, there are different parent directories with, presumably, different time stamps. Which disk does unRAID fetch the directory details from, to return to the nfs client? The problem is exacerbated when a cache drive comes into play, because there is now an additional directory on yet another physical disk. When I experienced the problem yesterday, I found that there was an empty 'user share' directory left on the cache drive. It wasn't close to a time when mover would be invoked, and it is a situation which shouldn't occur in normal use - this is what led me to try disabling the cache drive for that user share. Quote Link to comment
Joe L. Posted June 13, 2012 Share Posted June 13, 2012 Thinking about this problem, and trying to rationalise it against the "directory changing without a time stamp change" theory, I can see why there might be difficulties. When a user share is split across different physical disks, there are different parent directories with, presumably, different time stamps. Which disk does unRAID fetch the directory details from, to return to the nfs client? The problem is exacerbated when a cache drive comes into play, because there is now an additional directory on yet another physical disk. When I experienced the problem yesterday, I found that there was an empty 'user share' directory left on the cache drive. It wasn't close to a time when mover would be invoked, and it is a situation which shouldn't occur in normal use - this is what led me to try disabling the cache drive for that user share. it should, in my opinion, always return the most current of all the various time-stamps of the parallel directories involved. It does mean they must be kept in memory for comparison (otherwise, you would need to spin up the drives to learn them). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.