matt-uk Posted March 12, 2022 Share Posted March 12, 2022 Posting here as per the guidance for this error. This is a powerful and new server with 32GB memory so I am surprised by this - hopefully an easy fix. I did notice that after a reboot earlier to add a new SSD to use for a pool one of the core threads sat at 100% for a while and the memory usage was high. Not sure what prompted that and haven't seen it before but the server is less than a day old. Really keen to fix these small issues before the server goes in to full work so would appreciate any help diagnosing and fixing this. Diagnostics attached. Many thanks Matt unraid-blue-diagnostics-20220312-1658.zip Quote Link to comment
Squid Posted March 12, 2022 Share Posted March 12, 2022 Strange. The OOM happened before the system was even completely up and running. Reboot and wait to see if this ever happens again Quote Link to comment
matt-uk Posted March 12, 2022 Author Share Posted March 12, 2022 52 minutes ago, Squid said: Strange. The OOM happened before the system was even completely up and running. Reboot and wait to see if this ever happens again Thanks for looking, I have rebooted and getting the same OOM error still. Diagnostics attached again. Any thoughts? unraid-blue-diagnostics-20220312-1942.zip Quote Link to comment
Squid Posted March 12, 2022 Share Posted March 12, 2022 Does it happen in safe mode? Quote Link to comment
matt-uk Posted March 12, 2022 Author Share Posted March 12, 2022 (edited) 3 hours ago, Squid said: Does it happen in safe mode? Sorry for the delay, had to wait for file transfer to finish. In safe mode all seems fine, went back to normal and almost immediately OOM. Does this suggest it is plugin related? I don't have many and don't think they are rare or exotic - just pretty standard I think. Diagnostics attached. not-safe-unraid-blue-diagnostics-20220312-2308.zip safe-mode-unraid-blue-diagnostics-20220312-2247.zip Edited March 12, 2022 by matt-uk forgot attachments Quote Link to comment
matt-uk Posted March 13, 2022 Author Share Posted March 13, 2022 (edited) An update - I have been playing around with removing and adding plugins etc. The issue is with remote shares added though unassigned devices plugin. If I add a SMB share from my Synology NAS, I get this error, even if it is not mounted. The same if I add a NFS share, even when not mounted. The folders on the NAS are very large, movies folders of approx 50TB if that will impact this? I have noticed though that if only one share is added by NFS I do not get the error immediately but one (only one) of the CPU cores is stuck on 100% until the error occurs after a couple of minutes. unraid-blue-diagnostics-20220313-1049.zip Edited March 13, 2022 by matt-uk update Quote Link to comment
dlandon Posted March 13, 2022 Share Posted March 13, 2022 It appears that avahi-daemon did the oom: Mar 12 23:05:34 UNRAID-BLUE rc.docker: Nginx-Proxy-Manager-Official: started succesfully! Mar 12 23:06:59 UNRAID-BLUE kernel: avahi-daemon invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 Mar 12 23:06:59 UNRAID-BLUE kernel: CPU: 15 PID: 6129 Comm: avahi-daemon Tainted: P O 5.10.28-Unraid #1 Mar 12 23:06:59 UNRAID-BLUE kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 UD/Z390 UD, BIOS F10 11/05/2021 Mar 12 23:06:59 UNRAID-BLUE kernel: Call Trace: Mar 12 23:06:59 UNRAID-BLUE kernel: dump_stack+0x6b/0x83 Mar 12 23:06:59 UNRAID-BLUE kernel: dump_header+0x45/0x1e8 Mar 12 23:06:59 UNRAID-BLUE kernel: oom_kill_process+0x7b/0xf6 Mar 12 23:06:59 UNRAID-BLUE kernel: out_of_memory+0x3dd/0x410 Mar 12 23:06:59 UNRAID-BLUE kernel: __alloc_pages_slowpath.constprop.0+0x665/0x74c Mar 12 23:06:59 UNRAID-BLUE kernel: __alloc_pages_nodemask+0x1a1/0x1fc Mar 12 23:06:59 UNRAID-BLUE kernel: alloc_pages_vma+0x114/0x130 Mar 12 23:06:59 UNRAID-BLUE kernel: handle_mm_fault+0x9b6/0xec3 Mar 12 23:06:59 UNRAID-BLUE kernel: exc_page_fault+0x259/0x373 Mar 12 23:06:59 UNRAID-BLUE kernel: ? asm_exc_page_fault+0x8/0x30 Mar 12 23:06:59 UNRAID-BLUE kernel: asm_exc_page_fault+0x1e/0x30 Mar 12 23:06:59 UNRAID-BLUE kernel: RIP: 0033:0x1550e0e6798c UD Remote shares did not mount until some time after this: Mar 12 23:07:03 UNRAID-BLUE unassigned.devices: Mounting Remote Share '//DISKSTATION/movies'... Mar 12 23:07:03 UNRAID-BLUE unassigned.devices: Mount SMB share '//DISKSTATION/movies' using SMB default protocol. Mar 12 23:07:03 UNRAID-BLUE unassigned.devices: Mount SMB command: /sbin/mount -t 'cifs' -o rw,noserverino,nounix,iocharset=utf8,file_mode=0777,dir_mode=0777,uid=99,gid=100,credentials='/tmp/unassigned.devices/credentials_movies' '//DISKSTATION/movies' '/mnt/remotes/movies-ds' Mar 12 23:07:03 UNRAID-BLUE kernel: Key type cifs.idmap registered Mar 12 23:07:03 UNRAID-BLUE kernel: CIFS: Attempting to mount //DISKSTATION/movies Mar 12 23:07:03 UNRAID-BLUE kernel: CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount. Mar 12 23:07:09 UNRAID-BLUE kernel: CIFS: VFS: Error connecting to socket. Aborting operation. Mar 12 23:07:09 UNRAID-BLUE kernel: CIFS: VFS: cifs_mount failed w/return code = -113 and it failed with error connecting to socket. Not sure what this means: Mar 12 23:05:34 UNRAID-BLUE kernel: kmem.limit_in_bytes is deprecated and will be removed. Please report your usecase to [email protected] if you depend on this functionality. I would suggest removing all plugins and start with CA and bare UD. Then add others until you find the offending plugin. I have suspicions about the following: disklocation-master.plg gpustat.plg NerdPack.plg nvidia-driver.plg I'm not blaming any plugins, just eliminate them to start off. Quote Link to comment
matt-uk Posted March 13, 2022 Author Share Posted March 13, 2022 5 minutes ago, dlandon said: I would suggest removing all plugins and start with CA and bare UD. Then add others until you find the offending plugin. I have suspicions about the following: disklocation-master.plg gpustat.plg NerdPack.plg nvidia-driver.plg I'm not blaming any plugins, just eliminate them to start off. Thanks, in the last diagnostics (attached again here) these plugins had all been removed. I can replicate that I can reboot with no shares added in UD and have no errors. If I reboot with a share added I then get the OOM error. Everything else remains the same and the only plugins are CA, UD, UD+, UD Preclear and Fix Common Problems. I can remove UD+, UD Preclear and Fix Common Problems but the difference between the errors happening and not does seem to be if I have a share added. unraid-blue-diagnostics-20220313-1049.zip Quote Link to comment
dlandon Posted March 13, 2022 Share Posted March 13, 2022 What is the remote share you are trying to use? Show a screen shot of the UD page. Quote Link to comment
Squid Posted March 13, 2022 Share Posted March 13, 2022 @dlandon A thought: There's a 1MB limit on /mnt/disks if there's no shares mounted. Does that same 1MB limit apply if there is a share mounted (ie: what happens if you write to a mount point that doesn't exist? - does it cap it at the 1MB?) Doesn't explain though how the original OOM happened prior to array start though. Quote Link to comment
matt-uk Posted March 13, 2022 Author Share Posted March 13, 2022 8 minutes ago, dlandon said: What is the remote share you are trying to use? Show a screen shot of the UD page. Screenshot below and another diagnostic file. I removed the remaining plugins, so now I only have UD and CA. I added a SMB share and immediately got an OOM error. I removed share and added an NFS share. Rebooted. One CPU core was stuck at 100%, then dropped to normal levels and log showed OOM error. Before this I ran the server for 1 hour with no share and had no errors. I am connecting to a share on my NAS, it is approx 30TB maybe and NAS is about 100TB in total. unraid-blue-diagnostics-20220313-1222.zip Quote Link to comment
dlandon Posted March 13, 2022 Share Posted March 13, 2022 19 minutes ago, Squid said: @dlandon A thought: There's a 1MB limit on /mnt/disks if there's no shares mounted. Does that same 1MB limit apply if there is a share mounted (ie: what happens if you write to a mount point that doesn't exist? - does it cap it at the 1MB?) Doesn't explain though how the original OOM happened prior to array start though. There aren't any writes to the /mnt/remotes/ folder because the remote share has not even been mounted. What I suspect is going on is the ping status check to update the remote server online status. If you remember @Squid that's the routine you and I had to troubleshoot when your Windows computer would not mount a remote share. I'm not an expert at all the networking stuff, but I think that routine is stuck in a loop and kills avahi. 25 minutes ago, matt-uk said: I added a SMB share and immediately got an OOM error. Remove the share and add it using the IP address instead of the name 'DISKSTATION' and see if the same thing happens. Quote Link to comment
Solution matt-uk Posted March 13, 2022 Author Solution Share Posted March 13, 2022 46 minutes ago, dlandon said: Remove the share and add it using the IP address instead of the name 'DISKSTATION' and see if the same thing happens. That sorted it! So something funky is happening with the DNS. I have had issues in the past where the NAS wasn't showing up on the network view in windows and would have to be manually added. This certainly gives me a workaround, thank you, but wonder if the actual issue is Unraid related or down to my network/router/something else in which case it would of course be out of the scope of this forum. Should I mark as solved or do we want to investigate the route cause and fixes more? Quote Link to comment
dlandon Posted March 13, 2022 Share Posted March 13, 2022 13 minutes ago, matt-uk said: Should I mark as solved or do we want to investigate the route cause and fixes more? I'd like to do a bit more research. Having UD cause oom issues is not a good thing. Even if the remote share won't work, UD should be robust enough to handle it. Post the output of this command: cat /etc/hosts This will expose some IP addresses. You can PM that to me and not post it on the forum. Quote Link to comment
matt-uk Posted March 13, 2022 Author Share Posted March 13, 2022 15 minutes ago, dlandon said: I'd like to do a bit more research. Having UD cause oom issues is not a good thing. Even if the remote share won't work, UD should be robust enough to handle it. Especially as it manages to lookup the server and in the first instance, I wasn't manually adding the server name. Quote Post the output of this command: cat /etc/hosts This will expose some IP addresses. You can PM that to me and not post it on the forum Will do. Quote Link to comment
dlandon Posted March 13, 2022 Share Posted March 13, 2022 2 minutes ago, matt-uk said: Especially as it manages to lookup the server and in the first instance, I wasn't manually adding the server name. UD does that by pinging servers to see if the server supports the SMB port. Thanks to Windows and Linux samba changes for security, it has become more difficult to do named lookups unless you enable NetBIOS, which is now considered too insecure to use. Do you have NetBIOS enabled? Another thing I want you to look at is the 'Local TLD:' setting in the Settings->Management Access. Let me know what you have set. Quote Link to comment
dlandon Posted March 13, 2022 Share Posted March 13, 2022 I use the 'arp' command and it is known to have some issues - like hangs. Unfortunately, I think this is out of my control. Quote Link to comment
dlandon Posted March 13, 2022 Share Posted March 13, 2022 1 hour ago, matt-uk said: Will do. Thanks for the info in a PM. Now let's do a few more things. Go to your Settings->Management Access and show me how you have "Local TLD:" set. Also execute this command at the cli: arp -a DISKSTATION I think I might see something I can do. Quote Link to comment
matt-uk Posted March 13, 2022 Author Share Posted March 13, 2022 @dlandon have sent you output from CLI via PM. Local TLD is set as local. Quote Link to comment
dlandon Posted March 13, 2022 Share Posted March 13, 2022 Thank you. I'm out of ideas. Quote Link to comment
dlandon Posted March 14, 2022 Share Posted March 14, 2022 20 hours ago, matt-uk said: @dlandon have sent you output from CLI via PM. Local TLD is set as local. I just came across something that I'd like you to try for me: /usr/bin/nmblookup DISKSTATION | /bin/head -n1 | /bin/awk '{print $1}' Execute this command in a console and let me know the outcome. Be careful, this might cause the oom. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.