AquaWolf Posted February 16, 2022 Share Posted February 16, 2022 Hello there I had before some issues with Docker, these are now solved but my Unraid is still unstable after one or two days usage. My System Log shortly before last logline is full of: Tower rsyslogd: action 'action-0-builtin:omfile' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.2002.0 try https://www.rsyslog.com/e/2027 ] an these: Feb 4 04:36:34 Tower kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Feb 4 04:36:34 Tower kernel: rcu: #01112-....: (8 GPs behind) idle=1f6/1/0x4000000000000002 softirq=17084346/17084346 fqs=6963800 Feb 4 04:36:34 Tower kernel: #011(detected by 0, t=29760827 jiffies, g=60378281, q=18075856) Feb 4 04:36:34 Tower kernel: Sending NMI from CPU 0 to CPUs 12: Feb 4 04:36:34 Tower kernel: NMI backtrace for cpu 12 Feb 4 04:36:34 Tower kernel: CPU: 12 PID: 0 Comm: swapper/12 Tainted: G W 5.10.28-Unraid #1 Feb 4 04:36:34 Tower kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B09/X399 SLI PLUS (MS-7B09), BIOS A.70 11/14/2018 Feb 4 04:36:34 Tower kernel: RIP: 0010:nf_ct_key_equal+0x4/0x5d [nf_conntrack] Feb 4 04:36:34 Tower kernel: Code: 48 33 56 1c 48 09 d1 75 19 8b 57 24 8b 46 24 81 e2 ff ff ff 00 25 ff ff ff 00 39 c2 0f 94 c0 0f b6 c0 83 e0 01 c3 49 89 f9 55 <48> 89 f7 48 89 d5 49 8d 71 10 49 89 cb e8 9b ff ff ff 45 31 d2 84 Feb 4 04:36:34 Tower kernel: RSP: 0018:ffffc900069f4978 EFLAGS: 00000206 Feb 4 04:36:34 Tower kernel: RAX: 00000001119963e5 RBX: ffff888a12629448 RCX: ffffffff8210b440 Feb 4 04:36:34 Tower kernel: RDX: ffff888a1262a6cc RSI: ffffc900069f49e8 RDI: ffff888a12629448 Feb 4 04:36:34 Tower kernel: RBP: ffff888a1262a6c0 R08: ffff888a12629400 R09: ffff888a12629448 Feb 4 04:36:34 Tower kernel: R10: 0000000000000001 R11: ffffffff8210b440 R12: ffffffff8210b440 Feb 4 04:36:34 Tower kernel: R13: ffffc900069f49e8 R14: ffff888a1262a6cc R15: ffff888a12629400 Feb 4 04:36:34 Tower kernel: FS: 0000000000000000(0000) GS:ffff888c66d00000(0000) knlGS:0000000000000000 Feb 4 04:36:34 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 4 04:36:34 Tower kernel: CR2: 000000c000867000 CR3: 000000012ca00000 CR4: 00000000003506e0 Feb 4 04:36:34 Tower kernel: Call Trace: Feb 4 04:36:34 Tower kernel: <IRQ> Feb 4 04:36:34 Tower kernel: nf_conntrack_tuple_taken+0xb9/0x144 [nf_conntrack] Feb 4 04:36:34 Tower kernel: nf_nat_used_tuple+0x2e/0x49 [nf_nat] Feb 4 04:36:34 Tower kernel: nf_nat_setup_info+0x332/0x6aa [nf_nat] Feb 4 04:36:34 Tower kernel: ? ipt_do_table+0x4bb/0x5c0 [ip_tables] Feb 4 04:36:34 Tower kernel: ? ipt_do_table+0x570/0x5c0 [ip_tables] Feb 4 04:36:34 Tower kernel: __nf_nat_alloc_null_binding+0x5f/0x76 [nf_nat] Feb 4 04:36:34 Tower kernel: nf_nat_inet_fn+0x91/0x183 [nf_nat] Feb 4 04:36:34 Tower kernel: ? br_handle_frame_finish+0x351/0x351 Feb 4 04:36:34 Tower kernel: nf_nat_ipv4_pre_routing+0x1e/0x4a [nf_nat] Feb 4 04:36:34 Tower kernel: nf_hook_slow+0x39/0x8e Feb 4 04:36:34 Tower kernel: ? br_nf_forward_finish+0xd0/0xd0 [br_netfilter] Feb 4 04:36:34 Tower kernel: NF_HOOK+0xb7/0xf7 [br_netfilter] Feb 4 04:36:34 Tower kernel: ? br_nf_forward_finish+0xd0/0xd0 [br_netfilter] Feb 4 04:36:34 Tower kernel: br_nf_pre_routing+0x229/0x239 [br_netfilter] Feb 4 04:36:34 Tower kernel: ? br_nf_forward_finish+0xd0/0xd0 [br_netfilter] Feb 4 04:36:34 Tower kernel: br_handle_frame+0x25e/0x2a6 Feb 4 04:36:34 Tower kernel: ? br_pass_frame_up+0xda/0xda Feb 4 04:36:34 Tower kernel: __netif_receive_skb_core+0x335/0x4e7 Feb 4 04:36:34 Tower kernel: __netif_receive_skb_list_core+0x78/0x104 Feb 4 04:36:34 Tower kernel: netif_receive_skb_list_internal+0x1bf/0x1f2 Feb 4 04:36:34 Tower kernel: ? dev_gro_receive+0x55d/0x578 Feb 4 04:36:34 Tower kernel: gro_normal_list+0x1d/0x39 Feb 4 04:36:34 Tower kernel: napi_complete_done+0x79/0x104 Feb 4 04:36:34 Tower kernel: bnx2x_poll+0x100c/0x1285 [bnx2x] Feb 4 04:36:34 Tower kernel: ? resched_cpu+0x14/0x58 Feb 4 04:36:34 Tower kernel: ? enqueue_task_fair+0x101/0x156 Feb 4 04:36:34 Tower kernel: net_rx_action+0xf4/0x29d Feb 4 04:36:34 Tower kernel: __do_softirq+0xc4/0x1c2 Feb 4 04:36:34 Tower kernel: asm_call_irq_on_stack+0x12/0x20 Feb 4 04:36:34 Tower kernel: </IRQ> Feb 4 04:36:34 Tower kernel: do_softirq_own_stack+0x2c/0x39 Feb 4 04:36:34 Tower kernel: __irq_exit_rcu+0x45/0x80 Feb 4 04:36:34 Tower kernel: common_interrupt+0x119/0x12e Feb 4 04:36:34 Tower kernel: asm_common_interrupt+0x1e/0x40 Feb 4 04:36:34 Tower kernel: RIP: 0010:arch_local_irq_enable+0x7/0x8 Feb 4 04:36:34 Tower kernel: Code: 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 d2 39 c5 Feb 4 04:36:34 Tower kernel: RSP: 0018:ffffc90006443ea0 EFLAGS: 00000246 Feb 4 04:36:34 Tower kernel: RAX: ffff888c66d22380 RBX: 0000000000000002 RCX: 000000000000001f Feb 4 04:36:34 Tower kernel: RDX: 0000000000000000 RSI: 0000000021af2900 RDI: 0000000000000000 Feb 4 04:36:34 Tower kernel: RBP: ffff88869fa0dc00 R08: 0000f1be712f7038 R09: 0000000000000000 Feb 4 04:36:34 Tower kernel: R10: 0000000000002e7b R11: 071c71c71c71c71c R12: 0000f1be712f7038 Feb 4 04:36:34 Tower kernel: R13: ffffffff820c8c40 R14: 0000000000000002 R15: 0000000000000000 Feb 4 04:36:34 Tower kernel: cpuidle_enter_state+0x101/0x1c4 Feb 4 04:36:34 Tower kernel: cpuidle_enter+0x25/0x31 Feb 4 04:36:34 Tower kernel: do_idle+0x1a6/0x214 Feb 4 04:36:34 Tower kernel: cpu_startup_entry+0x18/0x1a Feb 4 04:36:34 Tower kernel: secondary_startup_64_no_verify+0xb0/0xbb Feb 4 04:36:34 Tower rsyslogd: file '/var/log/syslog'[9] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: No space left on device [v8.2002.0 try https://www.rsyslog.com/e/2027 ] Feb 4 04:36:34 Tower rsyslogd: action 'action-0-builtin:omfile' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.2002.0 try https://www.rsyslog.com/e/2027 ] Feb 4 04:36:34 Tower rsyslogd: file '/var/log/syslog'[9] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: No space left on device [v8.2002.0 try https://www.rsyslog.com/e/2027 ] Feb 4 04:36:34 Tower rsyslogd: action 'action-0-builtin:omfile' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.2002.0 try https://www.rsyslog.com/e/2027 ] Feb 4 04:36:34 Tower rsyslogd: file '/var/log/syslog'[9] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: No space left on device [v8.2002.0 try https://www.rsyslog.com/e/2027 ] Feb 4 04:36:34 Tower rsyslogd: action 'action-0-builtin:omfile' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.2002.0 try https://www.rsyslog.com/e/2027 ] Feb 4 04:36:34 Tower rsyslogd: rsyslogd[internal_messages]: 560 messages lost due to rate-limiting (500 allowed within 5 seconds) Feb 7 06:17:29 Tower nginx: 2022/02/07 06:17:29 [error] 7456#7456: MEMSTORE:00: can't create shared message for channel /disks Feb 7 06:17:30 Tower nginx: 2022/02/07 06:17:30 [crit] 7456#7456: ngx_slab_alloc() failed: no memory Feb 7 06:17:30 Tower nginx: 2022/02/07 06:17:30 [error] 7456#7456: shpool alloc failed Feb 7 06:17:30 Tower nginx: 2022/02/07 06:17:30 [error] 7456#7456: nchan: Out of shared memory while allocating message of size 7386. Increase nchan_max_reserved_memory. Feb 7 06:17:30 Tower nginx: 2022/02/07 06:17:30 [error] 7456#7456: *135964 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost" Feb 7 06:17:30 Tower nginx: 2022/02/07 06:17:30 [error] 7456#7456: MEMSTORE:00: can't create shared message for channel /disks I rebooted now. In the attachment are the diagnostics after reboot. I would be really thankfull for some help Kind regards tower-diagnostics-20220216-0730.zip Quote Link to comment
JorgeB Posted February 16, 2022 Share Posted February 16, 2022 See if this applies to you, if yes, upgrading to v6.10 and switching to ipvlan might fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info.: https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/ Quote Link to comment
AquaWolf Posted February 16, 2022 Author Share Posted February 16, 2022 Ok I'll give that a try this applies to me I'll report back this weekend. Quote Link to comment
AquaWolf Posted February 16, 2022 Author Share Posted February 16, 2022 I Upgrade to rc2 Docker is not working starting anymore seems to be a Problem with the docker image Feb 16 13:32:59 Tower emhttpd: shcmd (68): /usr/local/sbin/mount_image '/mnt/user/system/docker/docker.img' /var/lib/docker 100 Feb 16 13:32:59 Tower kernel: loop2: detected capacity change from 0 to 209715200 Feb 16 13:32:59 Tower kernel: BTRFS: device fsid b7558f03-4ad1-4884-b9b7-9fe63d3900b9 devid 1 transid 2830106 /dev/loop2 scanned by mount (4507) Feb 16 13:32:59 Tower kernel: BTRFS info (device loop2): flagging fs with big metadata feature Feb 16 13:32:59 Tower kernel: BTRFS info (device loop2): using free space tree Feb 16 13:32:59 Tower kernel: BTRFS info (device loop2): has skinny extents Feb 16 13:32:59 Tower kernel: BTRFS info (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Feb 16 13:32:59 Tower kernel: BTRFS info (device loop2): enabling ssd optimizations Feb 16 13:32:59 Tower kernel: BTRFS info (device loop2): cleaning free space cache v1 Feb 16 13:32:59 Tower kernel: BTRFS warning (device loop2): checksum verify failed on 28567109632 wanted 0x92b333b4 found 0x5563533e level 0 Feb 16 13:32:59 Tower kernel: BTRFS warning (device loop2): checksum verify failed on 28567109632 wanted 0x92b333b4 found 0x5563533e level 0 Feb 16 13:32:59 Tower kernel: BTRFS: error (device loop2) in btrfs_set_free_space_cache_v1_active:3992: errno=-5 IO failure Feb 16 13:32:59 Tower kernel: BTRFS error (device loop2): commit super ret -30 Feb 16 13:32:59 Tower root: mount: /var/lib/docker: can't read superblock on /dev/loop2. Feb 16 13:32:59 Tower kernel: BTRFS error (device loop2): open_ctree failed Feb 16 13:32:59 Tower root: mount error And my VMs with GPU Passthrough are not working anymore I'm getting just a few lines in syslog: Feb 16 13:38:13 Tower kernel: br0: port 2(vnet1) entered blocking state Feb 16 13:38:13 Tower kernel: br0: port 2(vnet1) entered disabled state Feb 16 13:38:13 Tower kernel: device vnet1 entered promiscuous mode Feb 16 13:38:13 Tower kernel: br0: port 2(vnet1) entered blocking state Feb 16 13:38:13 Tower kernel: br0: port 2(vnet1) entered forwarding state Feb 16 13:38:15 Tower avahi-daemon[4465]: Joining mDNS multicast group on interface vnet1.IPv6 with address fe80::fc54:ff:fe30:3c6e. Feb 16 13:38:15 Tower avahi-daemon[4465]: New relevant interface vnet1.IPv6 for mDNS. Feb 16 13:38:15 Tower avahi-daemon[4465]: Registering new address record for fe80::fc54:ff:fe30:3c6e on vnet1.*. Feb 16 13:38:19 Tower kernel: vfio-pci 0000:43:00.0: vfio_ecap_init: hiding ecap 0x19@0x900 Feb 16 13:38:19 Tower kernel: vfio-pci 0000:43:00.0: No more image in the PCI ROM Feb 16 13:38:19 Tower kernel: vfio-pci 0000:41:00.0: vfio_ecap_init: hiding ecap 0x1e@0x110 Feb 16 13:38:19 Tower kernel: vfio-pci 0000:41:00.0: vfio_ecap_init: hiding ecap 0x19@0x300 Quote Link to comment
ChatNoir Posted February 16, 2022 Share Posted February 16, 2022 Please attach your full diagnostics to your next post. Quote Link to comment
JorgeB Posted February 16, 2022 Share Posted February 16, 2022 9 minutes ago, AquaWolf said: seems to be a Problem with the docker image Yes, just delete and re-create the docker image, can't help with the VM issues. Quote Link to comment
AquaWolf Posted February 16, 2022 Author Share Posted February 16, 2022 yep recreated docker image, now it's working (have to restore all my containers from Apps Tab now) Here the diagnostics. tower-diagnostics-20220216-1401.zip Quote Link to comment
itimpi Posted February 16, 2022 Share Posted February 16, 2022 40 minutes ago, AquaWolf said: have to restore all my containers from Apps Tab now I hope you mean Apps -> Previous Apps to avoid having to reconfigure the containers? 1 Quote Link to comment
AquaWolf Posted February 16, 2022 Author Share Posted February 16, 2022 (edited) 4 hours ago, itimpi said: I hope you mean Apps -> Previous Apps to avoid having to reconfigure the containers? Yep 👍 Does anybody got an idea why the gpu passthrough VMs not working anymore? Edited February 16, 2022 by AquaWolf Quote Link to comment
AquaWolf Posted February 17, 2022 Author Share Posted February 17, 2022 Regarding GPU Passthrough I noticed that after removing the Graphics ROM BIOS the VM is booting correctly. When I try to pick the BIOS file from the GUI it only shows a blank selection could this be somehow related? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.