I can *make* all shares go offline (i.e. the issue with ???????? appearing next to /mnt/user)


Recommended Posts

Ok so hear me out, please ...

 

I've had unRAID only since 6.x and only use NFS.  No SMB.  I've experienced, many times, the issue where all /mnt/user shares completely disappear (from a mount perspective - the data is still on the disks).  Rebooting solves the issue each time.

 

drwxrwxrwx  2 root   root   40 Jan 27 14:51 disks/
d?????????  ? ?      ?       ?            ? user/
drwxrwxrwx  1 nobody users 174 Jan 30 12:58 user0/

 

Note that /mnt/user0 still has the expected permissions and I can browse to all shares WITHIN /mnt/user0.  I can't mount a remote NFS share via /mnt/user0, though (expected behaviour, I assume).

 

Now here's the kicker: I CAN REPRODUCE THIS i.e. I can CAUSE all shares to go offline.

 

This has popped up in the forums for quite some time and I think was mentioned alongside a previously incompatible kernel version causing FUSE to crash.  Something of that nature, anyway.  @limetech mentioned it at one point although I can't find the post now (so don't burn me if that is incorrect).

 

Here's my setup + how I can make all shares go offline.  I'm posting this now as my shares are currently offline.

 

- I run Duplicati as my backup software on Ubuntu 18.04.  By default it creates 50MB AES-encrypted ZIP archives and, considering I backup every day (incremental, but around 2TB code + projects, etc) that means there are LOT of 50MB files.

- Last time the shares disappeared, I was trying to manually delete a few thousand of those 50MB files.  This was in Ubuntu desktop, not from SSH or SFTP.  Total there were about 2800 files that I needed to delete.

- I have /mnt/user/backups mounted via NFSv4 in Ubuntu (no SMB here).  Mounting with NFSv3 makes no difference.  I was told recently by @itimpi that user vs user0 is all array disks including cache vs array disks only.  That was in this post:

- Today I was doing the exact same thing i.e. trying to delete a few thousand stale backup files from /mnt/user/backups and the same thing happened.  All shares offline.

- For what it's worth, /mnt/user/backups does not use cache.

 

I can't see how shares disappearing can be client-related, since the ???????? info at the start of this post is captured from an SSH session.  In any case, others have reported the same issue and it's unlikely (sure, it's still possible) that we're all doing the same thing.

 

The shares go offline pretty much immediately when I start deleting files.  I was running this in an SSH session to verify that:

 

watch ls -la /mnt

 

I've had the shares go offline overnight before, but backups run overnight so maybe it's related to the creation/management/listing (or something) of all these thousands of relatively small files.

 

Could it be a cache issue caused by me using /mnt/user directly?  I'm clutching at straws here.  My cache is 2x 500GB NVMe SSDs, setup as a cache pool.  I say that because of /mnt/user0 still being browsable from SSH when the issue is happening.  Remember that the share I'm working with has cache set to No.

 

I have diagnostics from before and after this happened, as well as diagnostics from right now when the shares are still offline.  I'm happy to share them with LimeTech but not so keen to share them on a public forum (for obvious reasons).  Can I share them with "official" support people?

 

Hoping someone can comment as a reproducible issue like this must mean something's not right.

 

 

Edited by digitalformula
Add note about NFS version
Link to comment
10 minutes ago, digitalformula said:

have diagnostics from before and after this happened, as well as diagnostics from right now when the shares are still offline.  I'm happy to share them with LimeTech but not so keen to share them on a public forum (for obvious reasons).  Can I share them with "official" support people?

Please send diags to [email protected]

 

thx for the detailed report.

Link to comment
21 minutes ago, digitalformula said:

Many thanks @limetech.  Diags sent.

Got them, thanks.  There is a bug report for this issue, can't find it atm.  I think it started happening with a certain kernel release, and then a few kernel patches later it quit happening.  Since it's come back evidently, probably not a kernel issue.  What's happening is a segfault is occurring within the FUSE library, which is used by our 'shfs' (user share file system).  When that crashes, the shares become inaccessible, as you have noted.  I'm going to move this topic to the prerelease bug reports so we can keep track of it.

Link to comment
40 minutes ago, limetech said:

Got them, thanks.  There is a bug report for this issue, can't find it atm.  I think it started happening with a certain kernel release, and then a few kernel patches later it quit happening.  Since it's come back evidently, probably not a kernel issue.  What's happening is a segfault is occurring within the FUSE library, which is used by our 'shfs' (user share file system).  When that crashes, the shares become inaccessible, as you have noted.  I'm going to move this topic to the prerelease bug reports so we can keep track of it.

That sounds exactly in-line with the description given last time i.e. a specific kernel version being part of the problem.

 

It hasn't happened for a while, from memory.  6.6.1 did it for sure (noted in previous post) and now 6.7.0rc2.

 

Since I'm on rc2 please let advise if there's any testing that can be done.  Since I made the decision to update my main system to 6.7.0rc2 (due to this issue happening on the current stable release) I am happy to try stuff, provided it isn't destructive.  If it helps, that is ...

Link to comment
  • 3 weeks later...
On 2/2/2019 at 7:22 PM, limetech said:

Got them, thanks.  There is a bug report for this issue, can't find it atm.  I think it started happening with a certain kernel release, and then a few kernel patches later it quit happening.  Since it's come back evidently, probably not a kernel issue.  What's happening is a segfault is occurring within the FUSE library, which is used by our 'shfs' (user share file system).  When that crashes, the shares become inaccessible, as you have noted.  I'm going to move this topic to the prerelease bug reports so we can keep track of it.

Has anything further come to light re this?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.