Cant Type in Console


K1ng0011

Recommended Posts

I am running unraid 6.8.3. I have been having a lot of problems with crashes lately. Once every few days. When the crash occurs I lose the web GUI, SSH, and the console. I have a monitor hooked up to my server and I want to run the command "tail /var/log/syslog -f" in the console to figure out what is going on. However it appears I am doing something wrong or my console is having issues. When I boot up my server as soon as I type in my username the console freezes. I can no longer type in the console until I reboot the server. The web gui and everything else continues to work until the server decides to crash again.  

 

Motherboard: MSI Pro Carbon X370

RAM: 16GB DDR4

CPU: Ryzen 5 2600X

GPU: Nvidia 1660

 

 

Edited by K1ng0011
Adding Information
Link to comment

JorgeB thank you for the link. I applied that bios setting last night along with turning off global c-states before I posted on the form. Hopefully that will help to resolve my issues. Also my ram and processor are not overclocked. The XMP profile is turned off for the ram. Since I can't use the console I started an ssh session with the command tail var/log/syslog -f running on a sperate computer.

Link to comment

I think I found an issue. My unraid server has not crashed yet but I looked at the logs from the command var/log/syslog -f running on an ssh session and I am seeing a lot of BTRFS errors. From what I have read is that I have some kind of corruption of the BTRFS filesystem on my 1tb nvme cache drive. It could be possible this has been causing my issues or just a symptom of me having to hard power off my server when the whole system locks up. I am backing up my appdata and I will will format my drive when that has completed and remove and recreate the docker image.

 

Dec 15 17:47:51 Tower kernel: BTRFS: error (device nvme0n1p1) in __btrfs_free_extent:6805: errno=-117 unknown
Dec 15 17:47:51 Tower kernel: BTRFS info (device nvme0n1p1): forced readonly
Dec 15 17:47:51 Tower kernel: BTRFS: error (device nvme0n1p1) in btrfs_run_delayed_refs:2935: errno=-117 unknown
Dec 15 17:47:51 Tower kernel: print_req_error: I/O error, dev loop2, sector 0
Dec 15 17:47:51 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 1, corrupt 0, gen 0
Dec 15 17:47:51 Tower kernel: BTRFS warning (device loop2): chunk 13631488 missing 1 devices, max tolerance is 0 for writeable mount
Dec 15 17:47:51 Tower kernel: BTRFS: error (device loop2) in write_all_supers:3717: errno=-5 IO failure (errors while submitting device barriers.)
Dec 15 17:47:51 Tower kernel: BTRFS error (device nvme0n1p1): pending csums is 16384
Dec 15 17:47:51 Tower kernel: BTRFS info (device loop2): forced readonly
Dec 15 17:47:51 Tower kernel: BTRFS warning (device loop2): Skipping commit of aborted transaction.
Dec 15 17:47:51 Tower kernel: BTRFS: error (device loop2) in cleanup_transaction:1860: errno=-5 IO failure
 

Dec 15 20:08:26 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 603, rd 0, flush 1, corrupt 0, gen 0
Dec 15 20:08:26 Tower kernel: loop: Write error at byte offset 3941613568, length 4096.
Dec 15 20:08:26 Tower kernel: print_req_error: I/O error, dev loop2, sector 7698464
Dec 15 20:08:26 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 604, rd 0, flush 1, corrupt 0, gen 0
Dec 15 20:08:26 Tower kernel: print_req_error: I/O error, dev loop2, sector 7697872
Dec 15 20:08:26 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 605, rd 0, flush 1, corrupt 0, gen 0
Dec 15 20:08:56 Tower kernel: loop: Write error at byte offset 3687092224, length 4096.
Dec 15 20:08:56 Tower kernel: print_req_error: I/O error, dev loop2, sector 7201352
Dec 15 20:08:56 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 606, rd 0, flush 1, corrupt 0, gen 0
Dec 15 20:08:56 Tower kernel: loop: Write error at byte offset 3941310464, length 4096.
Dec 15 20:08:56 Tower kernel: print_req_error: I/O error, dev loop2, sector 7697872
Dec 15 20:08:56 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 607, rd 0, flush 1, corrupt 0, gen 0
Dec 15 20:08:56 Tower kernel: loop: Write error at byte offset 3941613568, length 4096.
Dec 15 20:08:56 Tower kernel: print_req_error: I/O error, dev loop2, sector 7698464
Dec 15 20:08:56 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 608, rd 0, flush 1, corrupt 0, gen 0
 

Link to comment

Well that did not take as long as I thought. The server did not hard lock up. However I woke up this morning and my dockers stopped working. I looked at the logs and the only errors that I see are call traces. I will post a diagnostic later today. Based on they are showing it appears to be related to the NIC and possibly the macvlan. On some of my docker containers I do run a separate IP address than the host IP. I am not sure if this is due to the separate 10gig NIC I have and it has compatibility issues with macvlan or not. I think there are two options at this point that I can try. Disable the dockers with custom IPs or removed my PCIE 10gig NIC and see if I get the same issue or not with the custom IP dockers enabled using the onboard 1 gig NIC.

Link to comment

My errors seem to be related to call traces. I am not experienced enough to tell you exactly what they mean but I do see a lot of network related events and I think it is relate to the MACVLAN issue some people can experience with some hardware. I have referenced another thread below where Hoopster reported similar issues. I have currently removed my 10gig PCIE NIC and I am running off my motherboard NIC. We will see if that makes any difference in the call traces. If that does not work I will disable all dockers with separate IPs. I am trying to narrow down what is causing the issue.

 

10 Gig NIC: ASUS XG-C100C 

 

Call Trace Issues Thread: 

 

 

Dec 19 14:15:10 Tower kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Dec 19 14:15:10 Tower kernel: caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
Dec 19 14:15:12 Tower kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Dec 19 14:15:12 Tower kernel: caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
Dec 19 17:54:27 Tower webGUI: Successful login user root from 10.45.45.123
Dec 19 23:30:03 Tower webGUI: Successful login user root from 10.45.45.123
Dec 19 23:52:25 Tower kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Dec 19 23:52:25 Tower kernel: caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
Dec 20 01:44:14 Tower kernel: WARNING: CPU: 3 PID: 4818 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0x97/0x6b4
Dec 20 01:44:14 Tower kernel: Modules linked in: tun xt_nat macvlan nvidia_uvm(O) xt_CHECKSUM ipt_MASQUERADE ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle iptable_nat nf_nat_ipv4 nf_nat ip6table_filter ip6_tables iptable_filter ip_tables xfs md_mod nct6775 hwmon_vid edac_mce_amd nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel mpt3sas aes_x86_64 crypto_simd cryptd agpgart atlantic syscopyarea sysfillrect glue_helper i2c_piix4 k10temp mxm_wmi nvme i2c_core sysimgblt raid_class wmi_bmof fb_sys_fops ahci scsi_transport_sas libahci nvme_core pcc_cpufreq wmi button acpi_cpufreq [last unloaded: ccp]
Dec 20 01:44:14 Tower kernel: CPU: 3 PID: 4818 Comm: avahi-daemon Tainted: P           O      4.19.107-Unraid #1
Dec 20 01:44:14 Tower kernel: Hardware name: Micro-Star International Co., Ltd. MS-7A32/X370 GAMING PRO CARBON (MS-7A32), BIOS 1.L0 01/21/2019
Dec 20 01:44:14 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0x97/0x6b4
Dec 20 01:44:14 Tower kernel: Code: c1 ed 20 89 2c 24 e8 5f fb ff ff 8b 54 24 04 89 ef 89 c6 41 89 c4 e8 8f f9 ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 25 <0f> 0b 44 89 e6 89 ef 45 31 ff e8 5d f1 ff ff be 00 02 00 00 48 c7
Dec 20 01:44:14 Tower kernel: RSP: 0018:ffff88840eac38f0 EFLAGS: 00010202
Dec 20 01:44:14 Tower kernel: RAX: 0000000000000188 RBX: ffff888103a72a00 RCX: 0000000038e3383a
Dec 20 01:44:14 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff81e08fc8
Dec 20 01:44:14 Tower kernel: RBP: 0000000000002272 R08: ffff88811265a6b0 R09: 000000008101c937
Dec 20 01:44:14 Tower kernel: R10: 0000000000000158 R11: ffffffff81e91080 R12: 0000000000000071
Dec 20 01:44:14 Tower kernel: R13: ffffffff81e91080 R14: ffff88811265a640 R15: ffff88811265a698
Dec 20 01:44:14 Tower kernel: FS:  000014820b965b80(0000) GS:ffff88840eac0000(0000) knlGS:0000000000000000
Dec 20 01:44:14 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 20 01:44:14 Tower kernel: CR2: 0000154f18532870 CR3: 000000040a900000 CR4: 00000000003406e0

 

Dec 20 01:44:14 Tower kernel: Call Trace:

 

Dec 20 01:44:14 Tower kernel: <IRQ>
Dec 20 01:44:14 Tower kernel: ipv4_confirm+0xaf/0xb7
Dec 20 01:44:14 Tower kernel: nf_hook_slow+0x37/0x96
Dec 20 01:44:14 Tower kernel: ip_local_deliver+0xa9/0xd7
Dec 20 01:44:14 Tower kernel: ? ip_sublist_rcv_finish+0x53/0x53
Dec 20 01:44:14 Tower kernel: ip_sabotage_in+0x38/0x3e
Dec 20 01:44:14 Tower kernel: nf_hook_slow+0x37/0x96
Dec 20 01:44:14 Tower kernel: ip_rcv+0x8e/0xbe
Dec 20 01:44:14 Tower kernel: ? ip_rcv_finish_core.isra.0+0x2e2/0x2e2
Dec 20 01:44:14 Tower kernel: __netif_receive_skb_one_core+0x4d/0x69
Dec 20 01:44:14 Tower kernel: netif_receive_skb_internal+0x79/0x94
Dec 20 01:44:14 Tower kernel: br_pass_frame_up+0x123/0x145
Dec 20 01:44:14 Tower kernel: ? br_port_flags_change+0x29/0x29
Dec 20 01:44:14 Tower kernel: br_handle_frame_finish+0x335/0x37a
Dec 20 01:44:14 Tower kernel: ? ipt_do_table+0x5b6/0x603 [ip_tables]
Dec 20 01:44:14 Tower kernel: ? br_pass_frame_up+0x145/0x145
Dec 20 01:44:14 Tower kernel: br_nf_hook_thresh+0xa3/0xc3
Dec 20 01:44:14 Tower kernel: ? br_pass_frame_up+0x145/0x145
Dec 20 01:44:14 Tower kernel: br_nf_pre_routing_finish+0x239/0x260
Dec 20 01:44:14 Tower kernel: ? br_pass_frame_up+0x145/0x145
Dec 20 01:44:14 Tower kernel: ? nf_nat_ipv4_in+0x1d/0x64 [nf_nat_ipv4]
Dec 20 01:44:14 Tower kernel: br_nf_pre_routing+0x2fc/0x321
Dec 20 01:44:14 Tower kernel: ? br_nf_forward_ip+0x352/0x352
Dec 20 01:44:14 Tower kernel: nf_hook_slow+0x37/0x96
Dec 20 01:44:14 Tower kernel: br_handle_frame+0x290/0x2d3
Dec 20 01:44:14 Tower kernel: ? br_pass_frame_up+0x145/0x145
Dec 20 01:44:14 Tower kernel: ? br_handle_local_finish+0xe/0xe
Dec 20 01:44:14 Tower kernel: __netif_receive_skb_core+0x4a9/0x7db
Dec 20 01:44:14 Tower kernel: ? udp_gro_receive+0x4c/0x134
Dec 20 01:44:14 Tower kernel: __netif_receive_skb_one_core+0x31/0x69
Dec 20 01:44:14 Tower kernel: netif_receive_skb_internal+0x79/0x94
Dec 20 01:44:14 Tower kernel: napi_gro_receive+0x42/0x76
Dec 20 01:44:14 Tower kernel: aq_ring_rx_clean+0x32e/0x35c [atlantic]
Dec 20 01:44:14 Tower kernel: ? hw_atl_b0_hw_ring_rx_receive+0x129/0x1f5 [atlantic]
Dec 20 01:44:14 Tower kernel: aq_vec_poll+0xee/0x17d [atlantic]
Dec 20 01:44:14 Tower kernel: net_rx_action+0x10b/0x274
Dec 20 01:44:14 Tower kernel: __do_softirq+0xce/0x1e2
Dec 20 01:44:14 Tower kernel: irq_exit+0x5e/0x9d
Dec 20 01:44:14 Tower kernel: do_IRQ+0xaf/0xcd
Dec 20 01:44:14 Tower kernel: common_interrupt+0xf/0xf
Dec 20 01:44:14 Tower kernel: </IRQ>
Dec 20 01:44:14 Tower kernel: RIP: 0010:fput+0x6/0x77
Dec 20 01:44:14 Tower kernel: Code: ff 53 31 ff 48 87 3d 1c 75 0f 01 48 85 ff 74 0d 48 8b 1f e8 66 fe ff ff 48 89 df eb ee 5b c3 e9 5a fe ff ff 53 f0 48 ff 4f 38 <75> 6d 48 89 fb 65 48 8b 3c 25 40 5c 01 00 65 8b 05 b0 72 ec 7e a9
Dec 20 01:44:14 Tower kernel: RSP: 0018:ffffc900020b7ac0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffd8
Dec 20 01:44:14 Tower kernel: RAX: 0000000000000282 RBX: ffffc900020b7c80 RCX: ffff88840844cb68
Dec 20 01:44:14 Tower kernel: RDX: dead000000000200 RSI: 0000000000000282 RDI: ffff8883c84bd200
Dec 20 01:44:14 Tower kernel: RBP: ffff8883736d2000 R08: ffff8883c84bd300 R09: ffff8883c84bd300
Dec 20 01:44:14 Tower kernel: R10: ffffffff8115478e R11: ffff88840ba3ec80 R12: 0000000000000001
Dec 20 01:44:14 Tower kernel: R13: ffffc900020b7c80 R14: ffffc900020b7c50 R15: 0000000000000000
Dec 20 01:44:14 Tower kernel: ? generic_pipe_buf_confirm+0x3/0x3
Dec 20 01:44:14 Tower kernel: poll_freewait+0x3e/0x87
Dec 20 01:44:14 Tower kernel: do_sys_poll+0x39f/0x426
Dec 20 01:44:14 Tower kernel: ? udp_rmem_release+0x47/0x10b
Dec 20 01:44:14 Tower kernel: ? _copy_to_user+0x22/0x28
Dec 20 01:44:14 Tower kernel: ? put_cmsg+0xaa/0xf5
Dec 20 01:44:14 Tower kernel: ? __skb_recv_udp+0x16b/0x27d
Dec 20 01:44:14 Tower kernel: ? compat_poll_select_copy_remaining+0x118/0x118
Dec 20 01:44:14 Tower kernel: ? compat_poll_select_copy_remaining+0x118/0x118
Dec 20 01:44:14 Tower kernel: ? compat_poll_select_copy_remaining+0x118/0x118
Dec 20 01:44:14 Tower kernel: ? compat_poll_select_copy_remaining+0x118/0x118
Dec 20 01:44:14 Tower kernel: ? compat_poll_select_copy_remaining+0x118/0x118
Dec 20 01:44:14 Tower kernel: ? compat_poll_select_copy_remaining+0x118/0x118
Dec 20 01:44:14 Tower kernel: ? compat_poll_select_copy_remaining+0x118/0x118
Dec 20 01:44:14 Tower kernel: ? compat_poll_select_copy_remaining+0x118/0x118
Dec 20 01:44:14 Tower kernel: ? compat_poll_select_copy_remaining+0x118/0x118
Dec 20 01:44:14 Tower kernel: __se_sys_poll+0x55/0xd1
Dec 20 01:44:14 Tower kernel: do_syscall_64+0x57/0xf2
Dec 20 01:44:14 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 20 01:44:14 Tower kernel: RIP: 0033:0x14820ba71e63
Dec 20 01:44:14 Tower kernel: Code: 49 8b 45 10 5d 41 5c 41 5d 41 5e c3 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 07 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
Dec 20 01:44:14 Tower kernel: RSP: 002b:00007fff908cd478 EFLAGS: 00000246 ORIG_RAX: 0000000000000007
Dec 20 01:44:14 Tower kernel: RAX: ffffffffffffffda RBX: 0000000000426180 RCX: 000014820ba71e63
Dec 20 01:44:14 Tower kernel: RDX: 0000000000000064 RSI: 000000000000000b RDI: 0000000000449f90
Dec 20 01:44:14 Tower kernel: RBP: 000014820b965b00 R08: 0000000000000000 R09: 0000000000000006
Dec 20 01:44:14 Tower kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000426230
Dec 20 01:44:14 Tower kernel: R13: 000000000042ad10 R14: 00000000004229b0 R15: 0000000000000000
Dec 20 01:44:14 Tower kernel: ---[ end trace 32125fa20ad6dba7 ]---
Dec 20 04:00:01 Tower kernel: mdcmd (92): check 
Dec 20 04:00:01 Tower kernel: md: recovery thread: check P Q ...
Dec 20 04:00:24 Tower root: /var/lib/docker: 20.6 GiB (22055636992 bytes) trimmed on /dev/loop2
Dec 20 04:00:24 Tower root: /mnt/cache: 766.1 GiB (822554726400 bytes) trimmed on /dev/nvme0n1p1
Dec 20 04:40:01 Tower apcupsd[4765]: apcupsd exiting, signal 15
Dec 20 04:40:01 Tower apcupsd[4765]: apcupsd shutdown succeeded
Dec 20 04:40:04 Tower apcupsd[24328]: apcupsd 3.14.14 (31 May 2016) slackware startup succeeded
Dec 20 04:40:04 Tower apcupsd[24328]: NIS server startup succeeded
Dec 20 04:51:46 Tower crond[2019]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Dec 20 08:52:29 Tower dhcpcd[6763]: br0: failed to renew DHCP, rebinding

Link to comment

I was getting call trace errors every two days or so. As of this post I have had 7 days and 20 hours of uptime with no further call traces. I removed my 10 gig PCIE NIC and removed several plugins except the ones I have to have. I have had the "Host access to custom networks" setting enabled under the advanced docker settings along with dockers having their own IP addresses. I am going to let it run for another week. Then I will enable the "AMD Power Supply Idle Control" bios setting and the global C-States option. Let it run for another two weeks. After that I will reinstall my plugins and let that run for a two weeks. If I encounter no further issues it is likely specifically related to my NIC.

Link to comment

Well is has been about 15 days. No further call traces, errors, or lockups. I logged into the bios and set the power supply idle control setting back to the "auto" setting and I enabled the global c-states option by setting it also to auto. If there are no further errors or issues I will start to reinstall my plugins. I will update this post in two weeks unless I get an error or a hard lockup again. 

Link to comment
  • 3 weeks later...

I am back. No further call traces, errors, or lockups. I have had the power supply idle control setting set to auto and the global c-states set to auto. This has not caused any further issues. I will install some plugins and see if that causes any issues. At this point I think the issue is related to my 10Gig NIC somehow. However I will install some plugins and report back in another two weeks.

  • Like 1
Link to comment
  • 4 weeks later...

No further issues after installing my plugins. I does appear that this is related somehow related to my PCIE 10 gig NIC (ASUS XG-C100C). The problems stopped when I removed it from my server. For reference at this point I have had 30 days of uptime with no further hard locks, errors, or call traces. 

  • Like 2
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.