Out of memory error


Go to solution Solved by matt-uk,

Recommended Posts

Posting here as per the guidance for this error.

This is a powerful and new server with 32GB memory so I am surprised by this - hopefully an easy fix.

 

I did notice that after a reboot earlier to add a new SSD to use for a pool one of the core threads sat at 100% for a while and the memory usage was high. Not sure what prompted that and haven't seen it before but the server is less than a day old.

 

Really keen to fix these small issues before the server goes in to full work so would appreciate any help diagnosing and fixing this.

 

Diagnostics attached.

 

Many thanks

Matt

 

unraid-blue-diagnostics-20220312-1658.zip

Link to comment
3 hours ago, Squid said:

Does it happen in safe mode?

 

Sorry for the delay, had to wait for file transfer to finish.

 

In safe mode all seems fine, went back to normal and almost immediately OOM. Does this suggest it is plugin related? I don't have many and don't think they are rare or exotic - just pretty standard I think.

 

Diagnostics attached.

not-safe-unraid-blue-diagnostics-20220312-2308.zip safe-mode-unraid-blue-diagnostics-20220312-2247.zip

Edited by matt-uk
forgot attachments
Link to comment

An update - I have been playing around with removing and adding plugins etc. The issue is with remote shares added though unassigned devices plugin.

 

If I add a SMB share from my Synology NAS, I get this error, even if it is not mounted. The same if I add a NFS share, even when not mounted.

 

The folders on the NAS are very large, movies folders of approx 50TB if that will impact this?

 

I have noticed though that if only one share is added by NFS I do not get the error immediately but one (only one) of the CPU cores is stuck on 100% until the error occurs after a couple of minutes.

 

196490694_Screenshot2022-03-13at10_47_20.thumb.png.f80a7e02192901552d17011c2ec33ec7.png

 

 

 

 

unraid-blue-diagnostics-20220313-1049.zip

Edited by matt-uk
update
Link to comment

It appears that avahi-daemon did the oom:

Mar 12 23:05:34 UNRAID-BLUE rc.docker: Nginx-Proxy-Manager-Official: started succesfully!
Mar 12 23:06:59 UNRAID-BLUE kernel: avahi-daemon invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
Mar 12 23:06:59 UNRAID-BLUE kernel: CPU: 15 PID: 6129 Comm: avahi-daemon Tainted: P           O      5.10.28-Unraid #1
Mar 12 23:06:59 UNRAID-BLUE kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 UD/Z390 UD, BIOS F10 11/05/2021
Mar 12 23:06:59 UNRAID-BLUE kernel: Call Trace:
Mar 12 23:06:59 UNRAID-BLUE kernel: dump_stack+0x6b/0x83
Mar 12 23:06:59 UNRAID-BLUE kernel: dump_header+0x45/0x1e8
Mar 12 23:06:59 UNRAID-BLUE kernel: oom_kill_process+0x7b/0xf6
Mar 12 23:06:59 UNRAID-BLUE kernel: out_of_memory+0x3dd/0x410
Mar 12 23:06:59 UNRAID-BLUE kernel: __alloc_pages_slowpath.constprop.0+0x665/0x74c
Mar 12 23:06:59 UNRAID-BLUE kernel: __alloc_pages_nodemask+0x1a1/0x1fc
Mar 12 23:06:59 UNRAID-BLUE kernel: alloc_pages_vma+0x114/0x130
Mar 12 23:06:59 UNRAID-BLUE kernel: handle_mm_fault+0x9b6/0xec3
Mar 12 23:06:59 UNRAID-BLUE kernel: exc_page_fault+0x259/0x373
Mar 12 23:06:59 UNRAID-BLUE kernel: ? asm_exc_page_fault+0x8/0x30
Mar 12 23:06:59 UNRAID-BLUE kernel: asm_exc_page_fault+0x1e/0x30
Mar 12 23:06:59 UNRAID-BLUE kernel: RIP: 0033:0x1550e0e6798c

 

UD Remote shares did not mount until some time after this:

Mar 12 23:07:03 UNRAID-BLUE unassigned.devices: Mounting Remote Share '//DISKSTATION/movies'...
Mar 12 23:07:03 UNRAID-BLUE unassigned.devices: Mount SMB share '//DISKSTATION/movies' using SMB default protocol.
Mar 12 23:07:03 UNRAID-BLUE unassigned.devices: Mount SMB command: /sbin/mount -t 'cifs' -o rw,noserverino,nounix,iocharset=utf8,file_mode=0777,dir_mode=0777,uid=99,gid=100,credentials='/tmp/unassigned.devices/credentials_movies' '//DISKSTATION/movies' '/mnt/remotes/movies-ds'
Mar 12 23:07:03 UNRAID-BLUE kernel: Key type cifs.idmap registered
Mar 12 23:07:03 UNRAID-BLUE kernel: CIFS: Attempting to mount //DISKSTATION/movies
Mar 12 23:07:03 UNRAID-BLUE kernel: CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount.
Mar 12 23:07:09 UNRAID-BLUE kernel: CIFS: VFS: Error connecting to socket. Aborting operation.
Mar 12 23:07:09 UNRAID-BLUE kernel: CIFS: VFS: cifs_mount failed w/return code = -113

and it failed with error connecting to socket.

 

Not sure what this means:

Mar 12 23:05:34 UNRAID-BLUE kernel: kmem.limit_in_bytes is deprecated and will be removed. Please report your usecase to [email protected] if you depend on this functionality.

 

I would suggest removing all plugins and start with CA and bare UD.  Then add others until you find the offending plugin.  I have suspicions about the following:

  • disklocation-master.plg
  • gpustat.plg
  • NerdPack.plg
  • nvidia-driver.plg

I'm not blaming any plugins, just eliminate them to start off.

Link to comment
5 minutes ago, dlandon said:

 

I would suggest removing all plugins and start with CA and bare UD.  Then add others until you find the offending plugin.  I have suspicions about the following:

  • disklocation-master.plg
  • gpustat.plg
  • NerdPack.plg
  • nvidia-driver.plg

I'm not blaming any plugins, just eliminate them to start off.

 

 

Thanks, in the last diagnostics (attached again here) these plugins had all been removed.

 

I can replicate that I can reboot with no shares added in UD and have no errors.

If I reboot with a share added I then get the OOM error. Everything else remains the same and the only plugins are CA, UD, UD+, UD Preclear and Fix Common Problems.

 

I can remove UD+, UD Preclear and Fix Common Problems but the difference between the errors happening and not does seem to be if I have a share added.

 

 

 

unraid-blue-diagnostics-20220313-1049.zip

Link to comment

@dlandon  A thought:  There's a 1MB limit on /mnt/disks if there's no shares mounted.  Does that same 1MB limit apply if there is a share mounted (ie: what happens if you write to a mount point that doesn't exist? - does it cap it at the 1MB?)  Doesn't explain though how the original OOM happened prior to array start though.

Link to comment
8 minutes ago, dlandon said:

What is the remote share you are trying to use?  Show a screen shot of the UD page.

 

Screenshot below and another diagnostic file.

I removed the remaining plugins, so now I only have UD and CA.

 

I added a SMB share and immediately got an OOM error. I removed share and added an NFS share. Rebooted. One CPU core was stuck at 100%, then dropped to normal levels and log showed OOM error.

 

Before this I ran the server for 1 hour with no share and had no errors.

 

I am connecting to a share on my NAS, it is approx 30TB maybe and NAS is about 100TB in total.

 

2145913762_Screenshot2022-03-13at12_18_46.thumb.png.7ab0129ccc676d3b3d721bb50a713d94.png

 

 

unraid-blue-diagnostics-20220313-1222.zip

Link to comment
19 minutes ago, Squid said:

@dlandon  A thought:  There's a 1MB limit on /mnt/disks if there's no shares mounted.  Does that same 1MB limit apply if there is a share mounted (ie: what happens if you write to a mount point that doesn't exist? - does it cap it at the 1MB?)  Doesn't explain though how the original OOM happened prior to array start though.

There aren't any writes to the /mnt/remotes/ folder because the remote share has not even been mounted.  What I suspect is going on is the ping status check to update the remote server online status.  If you remember @Squid that's the routine you and I had to troubleshoot when your Windows computer would not mount a remote share.  I'm not an expert at all the networking stuff, but I think that routine is stuck in a loop and kills avahi.

 

25 minutes ago, matt-uk said:

I added a SMB share and immediately got an OOM error.

Remove the share and add it using the IP address instead of the name 'DISKSTATION' and see if the same thing happens.

Link to comment
  • Solution
46 minutes ago, dlandon said:

 

Remove the share and add it using the IP address instead of the name 'DISKSTATION' and see if the same thing happens.

 

That sorted it!  So something funky is happening with the DNS. I have had issues in the past where the NAS wasn't showing up on the network view in windows and would have to be manually added.

 

This certainly gives me a workaround, thank you, but wonder if the actual issue is Unraid related or down to my network/router/something else in which case it would of course be out of the scope of this forum.

 

Should I mark as solved or do we want to investigate the route cause and fixes more?

Link to comment
13 minutes ago, matt-uk said:

Should I mark as solved or do we want to investigate the route cause and fixes more?

I'd like to do a bit more research.  Having UD cause oom issues is not a good thing.  Even if the remote share won't work, UD should be robust enough to handle it.

 

Post the output of this command:

cat /etc/hosts

 

This will expose some IP addresses.  You can PM that to me and not post it on the forum.

Link to comment
15 minutes ago, dlandon said:

I'd like to do a bit more research.  Having UD cause oom issues is not a good thing.  Even if the remote share won't work, UD should be robust enough to handle it.

 

Especially as it manages to lookup the server and in the first instance, I wasn't manually adding the server name.

 

Quote

Post the output of this command:

cat /etc/hosts

 

This will expose some IP addresses.  You can PM that to me and not post it on the forum

 

Will do.

Link to comment
2 minutes ago, matt-uk said:

Especially as it manages to lookup the server and in the first instance, I wasn't manually adding the server name.

UD does that by pinging servers to see if the server supports the SMB port.  Thanks to Windows and Linux samba changes for security, it has become more difficult to do named lookups unless you enable NetBIOS, which is now considered too insecure to use.  Do you have NetBIOS enabled?

 

Another thing I want you to look at is the 'Local TLD:' setting in the Settings->Management Access.  Let me know what you have set.

Link to comment
1 hour ago, matt-uk said:

Will do.

Thanks for the info in a PM.  Now let's do a few more things.

 

Go to your Settings->Management Access and show me how you have "Local TLD:" set.

 

Also execute this command at the cli:

arp -a DISKSTATION

 

I think I might see something I can do.

 

Link to comment
20 hours ago, matt-uk said:

@dlandon

 

have sent you output from CLI via PM.

 

Local TLD is set as local.

 

525619717_Screenshot2022-03-13at16_10_32.thumb.png.412433160c7c6b3018d5dfb8eae45415.png

 

 

 

I just came across something that I'd like you to try for me:

/usr/bin/nmblookup DISKSTATION | /bin/head -n1 | /bin/awk '{print $1}'

Execute this command in a console and let me know the outcome.  Be careful, this might cause the oom.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.