smbd[6195]: Too many open files, unable to open more! smbd's max open files = 16424


HawkZ
Go to solution Solved by dlandon,

Recommended Posts

Hi - having an issue where I'm periodically losing connection to Unraid and at this point I believe I believe I have it tracked down to my backup software.  I run a backup of the network volumes on Unraid from another PC using CrashPlan and after 4-6 hours it reaches the max file limit. 

 

The title and below message is what I see in the log viewer:

smbd[6195]:   Too many open files, unable to open more!  smbd's max open files = 16424

 

I installed Open Files and was able to catch what was going on, unmapping a network drive from the PC doing the backup stops the buildup of open files.  Below is an example of where I have not quite yet hit the limit but will be there soon (note, log was from this morning, screenshot from this evening, array stopped/started since then):

image.thumb.png.c71eff35ff7a1a60bbee613cc1c4dd08.png

 

I do find it curious as to why the backup software is opening multiple sessions/instances of each file but for clarity only one instance of the files, but am not here really to discuss CrashPlan's method.  I suspect samba is keeping old open files open after they are no longer in use and multiple touches open new instances of open files.

 

I believe I have two issues - samba not releasing open files fast enough after no longer being in use, and also potentially needing to increase the open files limit.  The one thread that seems to be related to this that I can find is over 10 years old and all indications are that this was possibly long since fixed.  Not finding anything current enough to give me comfort modifying any config files or truly identifying this as the issue.  Had zero issues with this on my Drobo (RIP) previously.  The fix currently is to stop/start the array, which I later learned was effectively restarting samba.  Using the KILL button in the Open Files plugin also restores access to the shares.

 

Would love any help or insight you could provide.  Thank you!

 

Link to comment

To follow up - interestingly not getting the log entries any more, but I am unable to access the shares again:

 

image.thumb.png.9e1a122156fedd0deda211df51425326.png

 

Here are the logs, and me killing the process again just now from within the Open Files plugin to restore access.  Note current time (local) is 22:30 as of this writing:

image.png.c45540caa56ab572e70331e5ddf42e9d.png

Link to comment

Sorry - I'm new here!  Found how to do that and have it attached here.  While reviewing something else potentially relevant came to mind.  During my initial setup and loading of my data into Unraid I had the following issue where every file written to the array genereated an entry (at that point I was not backing it up yet over SMB):

 

 

And from recommendations here entered the following in SMB Extras under Settings/SMB:

image.thumb.png.06bfea132abff338275e1692dd839bf8.png

 

I am going to disable this now as this may be masking anything helpful from the logs - although ironically the logs do show all the smbd entries (like the title of this post).  I can let this run a while to see if anything else pops up in the logs.

 

 

Edited by HawkZ
Link to comment

Ok - finding a bit more.  While the above limits would seem where they are set, these are the limits on the running smbd process that are maxing out.  Note the "Max open files" of 16464 which matches where it is capping out.  

image.png.534a8408ab6550934e304811c9294840.png

 

However, you'll notice the smbd process is not running under root, which is where I gathered all the current user limit data before, so this process running under user level is getting a DIFFERENT set of limits to it:

image.png.6366c704666816b6fb84e9f307315b79.png

 

So, where are these set?

 

Also, of note, found this, but while this fixed the issue, "kernel change notify = no" seems like this would leave some gaps (potentially requiring a restart of samba) to let anyone see any file changes not initiated through samba access:

https://www.truenas.com/community/threads/kernel-change-notify-yes.45379/

Link to comment
5 hours ago, dlandon said:

Check this guide on ulimit: https://phoenixnap.com/kb/ulimit-linux-command

 

I think you may want:

ulimit -Sn 40960

 To set the soft limit.  When you're at a root prompt, I believe the soft limit is the hard limit.

Thanks for the reply!  It looks like I probably need to add the following to the /etc/security/limits.conf

*  soft  nofile 40960

However when adding that it does not seem to take effect after restarting samba.  After rebooting the entire Unraid server, the setting in limits.conf does not persist.  How can I apply this soft limit on all users either to test or to make persistent?

Link to comment
  • Solution

Put this in your /flash/config/go file or create a User Script to run when the array is first started:

ulimit -Sn 40960

 

You can't edit the /etc/security/limits/conf file because it is not persistent on reboots.  This command will apply the soft limit at the start of the array.

 

The soft limit should apply to any non root users.

 

Is there a way to apply the ulimit in the backup system script?  That would apply the ulimit when it runs.

Link to comment

@dlandon Thank you very much for your assistance here.  I installed the User Scripts plugin and set this to run at startup of the array.  Since doing this I have not been able to reproduce the issue, so I suspect it is resolved at this point.  I have no settings to adjust on CrashPlan to control the number of files it touches at once or it's methods so either the initial backup of the re-introduced files on the new Unraid server (really should not be "new" backup since everything was in the same folder structure/drive letter - however file ACLs were undoubtedly slightly different) was generating a lot of file touches and the multiple stops/restarts when it hit the maximum have let it finally finish, or else the new maximum limit is raised to the point that it allows it to do what it needed to do.  Not knowing if I have a unique case here or not, or if there are negative performance aspects to this setting, I'd certainly consider the increased soft limit to be in consideration for adding in as a default in a future update to avoid this issue for other users.

 

Thanks again!

Link to comment
  • 1 month later...

@dlandon For whatever reason, this issue has returned.  Have upped the open file limit to 65535 and am hitting this as well.  Attempts to push the limit higher than this have not been successful.  Since my prior NAS (Drobo) did not have this issue I am back to my original premise that Unraid is not closing opened SMB files properly.  The OpenFiles plugin is showing 2 instances of each file touched held "open" and the list sits there and does not shrink over time, just gets bigger.  What are the next steps in troubleshooting here?

Link to comment
1 hour ago, HawkZ said:

@dlandon For whatever reason, this issue has returned.  Have upped the open file limit to 65535 and am hitting this as well.  Attempts to push the limit higher than this have not been successful.  Since my prior NAS (Drobo) did not have this issue I am back to my original premise that Unraid is not closing opened SMB files properly.  The OpenFiles plugin is showing 2 instances of each file touched held "open" and the list sits there and does not shrink over time, just gets bigger.  What are the next steps in troubleshooting here?

65535 is a hard limit for samba, so I don't know how you'd get past that.  

Link to comment
37 minutes ago, HawkZ said:

@dlandon Why are multiple open files being created per-touch/read and never being released?  That seems to be the root of the issue:

image.thumb.png.d003ce83029fc2befec635914a301c73.png

Or your backup program is opening them and not closing them?  Why does your backup program run 4-6 hours?  It copies every file on each backup?  Doesn't sound like an incremental backup to me.

 

I'd talk to the authors of CrashPlan and ask them how their backup works and if it's possible that it is not closing files.

  • Thanks 1
Link to comment

Well I am approaching it from both sides here and have a ticket open with CrashPlan as well in case something changed recently with an update, but I don't suspect it did (nor can I find any config settings surrounding closing open files).  It runs a periodic check for if a file has changed and backs it up in a continual fashion.  This is the first time I've had any issues with running into the limit of SMB open files or had visibility into what was being left open, but the configuration of the backup has not changed from when I did this against a Drobo NAS.  To the PC/backup software it just sees the drive as mapped drive letters.  The difference here is just that the shares are now sitting on Unraid vs the Drobo which is why my primary suspicion is that this is on the Unraid side.  I am open to it being being either component here, just trying to narrow this down.  Also looking at anything on the Windows pc side that has to do with SMB session timeouts.

 

FWIW, I killed the CrashPlan process on the PC and the SMB open file count did not clear until I killed the smbd process in Open Files.  Probably let it sit for about 30 mins as a test.

Link to comment

I see some issues that might not be related, but should be worked on:

  • Your network.cfg has some strange settings that look to be from older Unraid versions.  Delete the /flash/config/network.cfg and reboot.  This will regenerate the default network settings.  Then go to Settings->Network and review and re-apply settings.  Your settings shows that you have NICs bonded, but the NICs are not specified, and the settings to set that is not right.
  • Set your docker custom network to ipvlan and not macvlan.

The reason I'm bring this up is your log is full of this:

Dec  6 04:41:02 TC kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe2c505e: link becomes ready
Dec  6 04:41:02 TC kernel: docker0: port 1(vethe2c505e) entered blocking state
Dec  6 04:41:02 TC kernel: docker0: port 1(vethe2c505e) entered forwarding state
Dec  6 04:41:02 TC kernel: docker0: port 1(vethe2c505e) entered disabled state
Dec  6 04:41:02 TC kernel: veth99c83b2: renamed from eth0
Dec  6 04:41:03 TC kernel: docker0: port 1(vethe2c505e) entered disabled state
Dec  6 04:41:03 TC kernel: device vethe2c505e left promiscuous mode
Dec  6 04:41:03 TC kernel: docker0: port 1(vethe2c505e) entered disabled state
Dec  6 04:42:02 TC kernel: docker0: port 1(veth9a08ef2) entered blocking state
Dec  6 04:42:02 TC kernel: docker0: port 1(veth9a08ef2) entered disabled state
Dec  6 04:42:02 TC kernel: device veth9a08ef2 entered promiscuous mode
Dec  6 04:42:02 TC kernel: docker0: port 1(veth9a08ef2) entered blocking state
Dec  6 04:42:02 TC kernel: docker0: port 1(veth9a08ef2) entered forwarding state
Dec  6 04:42:02 TC kernel: docker0: port 1(veth9a08ef2) entered disabled state
Dec  6 04:42:04 TC kernel: eth0: renamed from veth56aaa31

 

Link to comment

Right, I saw this in the logs as well.  This was being generated from a VPN docker container I was tinkering with but wasn't working.  I have regenerated the network.cfg file and switched docker to ipvlan instead of macvlan.  In addition, I've turned off all Docker containers other than MariaDB which is slated to be in use soon. I am glad to wipe this one too as a troubleshooting measure though if you feel it is related.  Just fired up CrashPlan again and will post back another diagnostic if/when it hits the limit unless you have additional input in the meantime.  Thank you!

Link to comment

CrashPlan backs up files when changed.  Runs continuously throughout the day.  For local drives on a system it is supposed to trigger a backup on change.  For backing up remote volumes it scans it periodically throughout the day.  When it does that it inspects each file and does a size/date comparison to the last backed up version to determine if it has changed and needs to be backed up for a change.  I don't manually initiate a backup, but after it decides to scan it again, the SMB open files start building up until we hit the limit.  I do have over 65535 files, but have no idea why the OpenFiles plugin shows 2 open entries per file.  Watching it closer it does appear that files I open through other means only show 1 entry open and clear after I close the file.  Items opened through CrashPlan stay open, despite me killing the CrashPlan program/service.  I did another test and if I reach the max SMB file limit and reboot the client computer, it does clear the SMB max open files condition (expectedly - but wanted to confirm that data point).  Having narrowed this to files touched by CrashPlan I'm a bit stumped as to why this was never a problem with my Drobo but is with Unraid.  No response from CrashPlan support yet but given this worked before I doubt they will have much to say about a different result on Unraid which they don't officially support.  Really appreciate your assistance here and any further ideas you may have, but other than the knowledge that this worked fine with the Drobo (which was perhaps masking the issue somehow), everything here now has me leaning towards this being a CrashPlan application issue not releasing files properly.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.