Jump to content
  • Unraid system lockups on/after startup utilising 100% memory


    Xploit61
    • Closed Urgent

    Hey everyone,

    I would really appreciate your help and assistance here. Since 6.12 rc2 my server appears to be hard locking with the RAM being fully utilised just before it does so.

    Watching the dashboard, memory starts to get high in stages up to 100% and then the system hard locks (dockers stop responding/working, file sharing, web gui etc.) and requires restart and then could happen again. Sometimes it manages to get through and kills a process (at least what I can see from the log) and then finally continues to work.

    I have upgraded the system hardware with old pc (from 6.11.5 which worked ok) and has 3900x, 32Gb etc. now. Under normal usage it varies from 22%-48%.

    I have included the diagnostics file and looking for any advice or guidance here please.

    Many thanks in advance for your help.

    xtc-unraid-diagnostics-20230423-2302.zip




    User Feedback

    Recommended Comments

    Hey JorgeB,


    Thanks so much for your reply. I tried what you said but it seemed to still hit high 100% memory. I left it and seemed to kick back into action again. I have the latest log which may indicate what could be the cause. Fix Common Problems has showed I need to change from macvlan now to ipvlan which it never did before - not sure if its because i upgraded to rc3 now?

    xtc-unraid-diagnostics-20230426-2103.zip

    Link to comment

    Try booting in safe mode with VM/docker services disabled, then start enabling one by one.

     

    There are also macvlan call traces, so you should switch to ipvlan.

    Link to comment

    Hey JorgeB,

     

    Thanks again for your help and apologies for late reply as I was away.

    I have implemented the ipvlan and that seems to have resolved the issues, but I tried disabling docker and vms and the server still proceeded to freeze. I have upgraded to RC5 now as well but what I am noticing from having the logs open once started is the following:

     

    • Memory usage starts to increase and it displays the following:

     

    Quote

    May 10 20:25:06 XTC-UNRAID unassigned.devices: Warning: shell_exec(/sbin/arp -a 'XTC-DISKSTATION' 2>&1) took longer than 15s!

    May 10 20:25:10 XTC-UNRAID unassigned.devices: Warning: shell_exec(/sbin/arp -a 'XTC-DISKSTATION' 2>&1) took longer than 15s!

    May 10 20:25:16 XTC-UNRAID unassigned.devices: Warning: shell_exec(/sbin/arp -an 'XTC-DISKSTATION.local' 2>&1) took longer than 10s! May 10 20:25:20 XTC-UNRAID unassigned.devices: Warning: shell_exec(/sbin/arp -an 'XTC-DISKSTATION.local' 2>&1) took longer than 10s!

    May 10 20:25:57 XTC-UNRAID ool www[22041]: /usr/local/emhttp/plugins/dynamix/scripts/emcmd 'cmdStatus=Apply' May 10 20:25:57 XTC-UNRAID emhttpd: Starting services...

    May 10 20:25:57 XTC-UNRAID emhttpd: shcmd (1617): /etc/rc.d/rc.samba restart

    May 10 20:25:57 XTC-UNRAID wsdd2[16148]: 'Terminated' signal received.

    May 10 20:25:57 XTC-UNRAID wsdd2[16148]: terminating.

    May 10 20:25:59 XTC-UNRAID root: Starting Samba: /usr/sbin/smbd -D

    May 10 20:25:59 XTC-UNRAID root: /usr/sbin/nmbd -D

    May 10 20:25:59 XTC-UNRAID root: /usr/sbin/wsdd2 -d

    May 10 20:25:59 XTC-UNRAID wsdd2[25654]: starting.

    May 10 20:25:59 XTC-UNRAID root: /usr/sbin/winbindd -D

    May 10 20:25:59 XTC-UNRAID emhttpd: shcmd (1621): /etc/rc.d/rc.avahidaemon restart

    May 10 20:25:59 XTC-UNRAID root: Stopping Avahi mDNS/DNS-SD Daemon: stopped

    May 10 20:26:05 XTC-UNRAID root: Starting Avahi mDNS/DNS-SD Daemon: /usr/sbin/avahi-daemon -D

    May 10 20:26:05 XTC-UNRAID root: Daemon already running on PID 9525

    May 10 20:26:05 XTC-UNRAID emhttpd: shcmd (1621): exit status: 255

    May 10 20:26:05 XTC-UNRAID emhttpd: shcmd (1622): /etc/rc.d/rc.avahidnsconfd restart

    May 10 20:26:05 XTC-UNRAID root: Stopping Avahi mDNS/DNS-SD DNS Server Configuration Daemon: stopped

    May 10 20:26:05 XTC-UNRAID avahi-dnsconfd[9534]: Got SIGTERM, quitting.

    May 10 20:26:05 XTC-UNRAID root: Starting Avahi mDNS/DNS-SD DNS Server Configuration Daemon: /usr/sbin/avahi-dnsconfd -D


     

    • At this point it freezes, but I disabled docker and vms and it still froze. It then seems to show the following:

     

    Quote

    May 10 20:27:05 XTC-UNRAID kernel: SVM: TSC scaling supported

    May 10 20:27:05 XTC-UNRAID kernel: kvm: Nested Virtualization enabled

    May 10 20:27:05 XTC-UNRAID kernel: SVM: kvm: Nested Paging enabled

    May 10 20:27:05 XTC-UNRAID kernel: SEV supported: 509 ASIDs

    May 10 20:27:05 XTC-UNRAID kernel: SVM: Virtual VMLOAD VMSAVE supported

    May 10 20:27:05 XTC-UNRAID kernel: SVM: Virtual GIF supported

    May 10 20:27:05 XTC-UNRAID kernel: SVM: LBR virtualization supported

    May 10 20:28:24 XTC-UNRAID ntpd[1804]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

    May 10 20:28:25 XTC-UNRAID php-fpm[8034]: [WARNING] [pool www] child 18650 exited on signal 9 (SIGKILL) after 207.818280 seconds from start

    May 10 20:31:28 XTC-UNRAID kernel: crond invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0

    May 10 20:31:28 XTC-UNRAID kernel: CPU: 6 PID: 1829 Comm: crond Tainted: P O 6.1.27-Unraid #1

    May 10 20:31:28 XTC-UNRAID kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ULTRA/X570 AORUS ULTRA, BIOS F36 12/26/2022

    May 10 20:31:28 XTC-UNRAID kernel: Call Trace: May 10 20:31:28 XTC-UNRAID kernel:

    May 10 20:31:28 XTC-UNRAID kernel: dump_stack_lvl+0x44/0x5c

    May 10 20:31:28 XTC-UNRAID kernel: dump_header+0x4a/0x211

    May 10 20:31:28 XTC-UNRAID kernel: oom_kill_process+0x80/0x111

    May 10 20:31:28 XTC-UNRAID kernel: out_of_memory+0x3b3/0x3e5 May 10 20:31:28 XTC-UNRAID kernel: __alloc_pages_slowpath.constprop.0+0x6f5/0x8f8

    May 10 20:31:28 XTC-UNRAID kernel: __alloc_pages+0x132/0x1e8 May 10 20:31:28 XTC-UNRAID kernel: folio_alloc+0x14/0x35

    May 10 20:31:28 XTC-UNRAID kernel: __filemap_get_folio+0x185/0x213

    May 10 20:31:28 XTC-UNRAID kernel: ? preempt_latency_start+0x1e/0x46

    May 10 20:31:28 XTC-UNRAID kernel: filemap_fault+0x317/0x52f

    May 10 20:31:28 XTC-UNRAID kernel: __do_fault+0x2d/0x6b

    May 10 20:31:28 XTC-UNRAID kernel: __handle_mm_fault+0xa22/0xcf9

    May 10 20:31:28 XTC-UNRAID kernel: handle_mm_fault+0x13d/0x20f

    May 10 20:31:28 XTC-UNRAID kernel: do_user_addr_fault+0x36a/0x530

    May 10 20:31:28 XTC-UNRAID kernel: exc_page_fault+0xfb/0x11d

    May 10 20:31:28 XTC-UNRAID kernel: asm_exc_page_fault+0x22/0x30

    May 10 20:31:28 XTC-UNRAID kernel: RIP: 0033:0x402142

    May 10 20:31:28 XTC-UNRAID kernel: Code: Unable to access opcode bytes at 0x402118.

    May 10 20:31:28 XTC-UNRAID kernel: RSP: 002b:00007ffd086652e0 EFLAGS: 00010246


     

    • Memory usage managed to stabilise and I see this info in the logs:

     

    Quote

    May 10 20:31:28 XTC-UNRAID kernel: [   1562]     0  1562      931      325    45056        0             0 find
    May 10 20:31:28 XTC-UNRAID kernel: [   1563]     0  1563      647      210    45056        0             0 wc
    May 10 20:31:28 XTC-UNRAID kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=avahi-daemon,pid=9525,uid=61
    May 10 20:31:28 XTC-UNRAID kernel: Out of memory: Killed process 9525 (avahi-daemon) total-vm:30146240kB, anon-rss:30141228kB, file-rss:0kB, shmem-rss:1904kB, UID:61 pgtables:59024kB oom_score_adj:0
    May 10 20:31:29 XTC-UNRAID php-fpm[8034]: [WARNING] [pool www] child 28431 exited on signal 9 (SIGKILL) after 130.525607 seconds from start
    May 10 20:31:29 XTC-UNRAID nginx: 2023/05/10 20:31:29 [error] 8094#8094: *2910 connect() to unix:/var/run/syslog.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.165, server: , request: "GET /webterminal/syslog/token HTTP/1.1", upstream: "http://unix:/var/run/syslog.sock:/token", host: "192.168.1.209:8081", referrer: "http://192.168.1.209:8081/webterminal/syslog/"
    May 10 20:31:29 XTC-UNRAID kernel: oom_reaper: reaped process 9525 (avahi-daemon), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
    May 10 20:31:29 XTC-UNRAID avahi-dnsconfd[29518]: read(): Connection reset by peer
    May 10 20:31:29 XTC-UNRAID nginx: 2023/05/10 20:31:29 [error] 8094#8094: *2926 connect() to unix:/var/run/syslog.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.165, server: , request: "GET /webterminal/syslog/ws HTTP/1.1", upstream: "http://unix:/var/run/syslog.sock:/ws", host: "192.168.1.209:8081"



    Not sure if this helps in any way and if booting into safe mode is still suggested even if docker/vm was off when this happened?

    Really appreciate your help, thank you again.

    Edited by Xploit61
    formatting
    Link to comment
    12 hours ago, Xploit61 said:

    At this point it freezes, but I disabled docker and vms and it still froze.

    That suggests a hardware issue, since it's a Ryzen based server take a look here, it may help.

    Link to comment

    Thank you so much JorgeB, let me investigate. I will close this out for now, but was just wanting to make sure it wasn't because of the RC or anything (with provided diagnostics that may help) causing the issue.

    Appreciate it.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...