orem684 Posted January 23, 2020 Share Posted January 23, 2020 Earlier today I was using Wireguard to remotely access my LAN and my connection became unresponsive. When I arrived home I noticed that my CPU was showing as overloaded (almost all cores at about 100%), Docker and VMs were disabled and all shares are inaccessible. I attempted a couple of hard shutdowns but Unraid boots back up to the same state. Here is the section of the logs where an error first appears : Jan 22 17:46:17 Tower kernel: ------------[ cut here ]------------ Jan 22 17:46:17 Tower kernel: kernel BUG at fs/btrfs/ctree.c:3246! Jan 22 17:46:17 Tower kernel: invalid opcode: 0000 [#1] SMP NOPTI Jan 22 17:46:17 Tower kernel: CPU: 1 PID: 86 Comm: kworker/u64:3 Not tainted 4.19.94-Unraid #1 Jan 22 17:46:17 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570M Pro4, BIOS P1.50 07/15/2019 Jan 22 17:46:17 Tower kernel: Workqueue: btrfs-endio-write btrfs_endio_write_helper Jan 22 17:46:17 Tower kernel: RIP: 0010:btrfs_set_item_key_safe+0xc0/0x136 Jan 22 17:46:17 Tower kernel: Code: 00 4c 89 ef 48 8d 74 24 07 48 63 d2 48 6b d2 19 48 83 c2 65 e8 78 16 04 00 48 89 de 48 8d 7c 24 07 e8 95 f4 ff ff 85 c0 7f 02 <0f> 0b 48 8b 43 09 49 63 d4 b9 11 00 00 00 4c 89 ef 48 6b d2 19 48 Jan 22 17:46:17 Tower kernel: RSP: 0018:ffffc90001d6fbc0 EFLAGS: 00010286 Jan 22 17:46:17 Tower kernel: RAX: 00000000ffffffff RBX: ffffc90001d6fca5 RCX: 000000000000006c Jan 22 17:46:17 Tower kernel: RDX: 0000000000000000 RSI: ffffc90001d6fca5 RDI: ffffc90001d6fb9f Jan 22 17:46:17 Tower kernel: RBP: ffff8882f9ddeaf0 R08: 0000000000001000 R09: 0000160000000000 Jan 22 17:46:17 Tower kernel: R10: ffff888000000000 R11: 0000000000000000 R12: 000000000000003f Jan 22 17:46:17 Tower kernel: R13: ffff8882ddf7e9d8 R14: 00000000000032c0 R15: ffff8883cd5f8000 Jan 22 17:46:17 Tower kernel: FS: 0000000000000000(0000) GS:ffff88840e640000(0000) knlGS:0000000000000000 Jan 22 17:46:17 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 22 17:46:17 Tower kernel: CR2: 000056445107a5a4 CR3: 000000033b2e2000 CR4: 00000000003406e0 Jan 22 17:46:17 Tower kernel: Call Trace: Jan 22 17:46:17 Tower kernel: __btrfs_drop_extents+0x5e2/0xb12 Jan 22 17:46:17 Tower kernel: insert_reserved_file_extent.constprop.0+0x98/0x2cc Jan 22 17:46:17 Tower kernel: btrfs_finish_ordered_io+0x317/0x5d2 Jan 22 17:46:17 Tower kernel: normal_work_helper+0xd0/0x1c7 Jan 22 17:46:17 Tower kernel: process_one_work+0x16e/0x24f Jan 22 17:46:17 Tower kernel: worker_thread+0x1e2/0x2b8 Jan 22 17:46:17 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 Jan 22 17:46:17 Tower kernel: kthread+0x10c/0x114 Jan 22 17:46:17 Tower kernel: ? kthread_park+0x89/0x89 Jan 22 17:46:17 Tower kernel: ret_from_fork+0x22/0x40 Jan 22 17:46:17 Tower kernel: Modules linked in: veth xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap macvlan xt_nat ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod nct6775 hwmon_vid wireguard ip6_udp_tunnel udp_tunnel bonding edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc igb aesni_intel aes_x86_64 crypto_simd cryptd k10temp i2c_piix4 i2c_algo_bit glue_helper wmi_bmof i2c_core ccp ahci libahci wmi button pcc_cpufreq acpi_cpufreq Jan 22 17:46:17 Tower kernel: ---[ end trace 69d3dbdcd1db1e30 ]--- Jan 22 17:46:17 Tower kernel: RIP: 0010:btrfs_set_item_key_safe+0xc0/0x136 Jan 22 17:46:17 Tower kernel: Code: 00 4c 89 ef 48 8d 74 24 07 48 63 d2 48 6b d2 19 48 83 c2 65 e8 78 16 04 00 48 89 de 48 8d 7c 24 07 e8 95 f4 ff ff 85 c0 7f 02 <0f> 0b 48 8b 43 09 49 63 d4 b9 11 00 00 00 4c 89 ef 48 6b d2 19 48 Jan 22 17:46:17 Tower kernel: RSP: 0018:ffffc90001d6fbc0 EFLAGS: 00010286 Jan 22 17:46:17 Tower kernel: RAX: 00000000ffffffff RBX: ffffc90001d6fca5 RCX: 000000000000006c Jan 22 17:46:17 Tower kernel: RDX: 0000000000000000 RSI: ffffc90001d6fca5 RDI: ffffc90001d6fb9f Jan 22 17:46:17 Tower kernel: RBP: ffff8882f9ddeaf0 R08: 0000000000001000 R09: 0000160000000000 Jan 22 17:46:17 Tower kernel: R10: ffff888000000000 R11: 0000000000000000 R12: 000000000000003f Jan 22 17:46:17 Tower kernel: R13: ffff8882ddf7e9d8 R14: 00000000000032c0 R15: ffff8883cd5f8000 Jan 22 17:46:17 Tower kernel: FS: 0000000000000000(0000) GS:ffff88840e640000(0000) knlGS:0000000000000000 Jan 22 17:46:17 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 22 17:46:17 Tower kernel: CR2: 000056445107a5a4 CR3: 000000033b2e2000 CR4: 00000000003406e0 Jan 22 17:46:54 Tower webGUI: Successful login user root from 192.168.1.81 Jan 22 17:47:04 Tower sSMTP[22829]: Creating SSL connection to host Jan 22 17:47:04 Tower sSMTP[22829]: SSL connection using TLS_AES_256_GCM_SHA384 Jan 22 17:47:06 Tower sSMTP[22829]: Sent mail for ****** (221 2.0.0 closing connection y197sm201426pfc.79 - gsmtp) uid=0 username=root outbytes=665 Jan 22 17:47:10 Tower emhttpd: req (1): csrf_token=****************&title=System Log&cmd=/webGui/scripts/tail_log&arg1=syslog This is my first time posting here so please let me know if i should include additional information. Quote Link to comment
Squid Posted January 23, 2020 Share Posted January 23, 2020 1 hour ago, orem684 said: This is my first time posting here so please let me know if i should include additional information. Post the entire diagnostics.zip file (Tools - Diagnostics) Quote Link to comment
orem684 Posted January 23, 2020 Author Share Posted January 23, 2020 4 minutes ago, Squid said: Post the entire diagnostics.zip file (Tools - Diagnostics) I was able to start into safe mode to get the diagnostics.zip. I also found out that when I start the array again in safe mode I go back to the unresponsive state. But when I disable Docker before starting the array, I am able to get the shares back up and access most settings. I am currently running a parity check in safe mode. tower-diagnostics-20200122-1926.zip Quote Link to comment
Squid Posted January 23, 2020 Share Posted January 23, 2020 44 minutes ago, orem684 said: I was able to start into safe mode to get the diagnostics.zip. I also found out that when I start the array again in safe mode I go back to the unresponsive state. But when I disable Docker before starting the array, I am able to get the shares back up and access most settings. I am currently running a parity check in safe mode. tower-diagnostics-20200122-1926.zip 79.72 kB · 1 download Ultimately what we're looking for is what happens when the array starts. Configure the syslog server to mirror the syslog to the flash https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-781601 and start the array. If it's unresponsive after a few minutes, then you'll have to reboot and then post the syslog.txt file that'll be in the logs folder on the flash drive. 1 Quote Link to comment
orem684 Posted January 24, 2020 Author Share Posted January 24, 2020 (edited) Thank you for the follow up, here is the syslog.txt file. Unraid continues to freeze whenever I start the array in safe mode with Docker enabled. I performed the restart at around line 70 (13:24). syslog Edited January 24, 2020 by orem684 Quote Link to comment
orem684 Posted January 24, 2020 Author Share Posted January 24, 2020 Is there a way to revert back to 6.7.2? Quote Link to comment
SpencerJ Posted January 24, 2020 Share Posted January 24, 2020 2 hours ago, orem684 said: Is there a way to revert back to 6.7.2? https://s3.amazonaws.com/dnld.lime-technology.com/stable/unRAIDServer-6.7.2-x86_64.zip There is the 6.7.2 zip^^ from the Unraid website. Quote Link to comment
orem684 Posted January 24, 2020 Author Share Posted January 24, 2020 6 minutes ago, SpencerJ said: https://s3.amazonaws.com/dnld.lime-technology.com/stable/unRAIDServer-6.7.2-x86_64.zip There is the 6.7.2 zip^^ from the Unraid website. Thank you. Is there a command to install the update or would I have to flash the USB again? Does this affect my license? Quote Link to comment
SpencerJ Posted January 24, 2020 Share Posted January 24, 2020 2 minutes ago, orem684 said: Thank you. Is there a command to install the update or would I have to flash the USB again? Does this affect my license? Simply copy all the bz* files to your flash drive and reboot. No effect on the license 1 Quote Link to comment
orem684 Posted January 24, 2020 Author Share Posted January 24, 2020 1 hour ago, SpencerJ said: Simply copy all the bz* files to your flash drive and reboot. No effect on the license I'm assuming I would also delete the existing bz* files on the flash drive? Quote Link to comment
JorgeB Posted January 24, 2020 Share Posted January 24, 2020 Simply overwrite them with the ones form the v6.7.2 zip and reboot. 1 Quote Link to comment
orem684 Posted January 24, 2020 Author Share Posted January 24, 2020 So I reverted back to 6.7.2 and I am still experiencing the same issue. I've attached the new syslog after my last reboot. Any thoughts on what to try next? syslog Quote Link to comment
JorgeB Posted January 24, 2020 Share Posted January 24, 2020 A btrfs process is crashing but I can't see any other btrfs problem on any of the file systems, so I would first start by running memtest since btrfs is very intolerant of memory errors, also make sure your RAM isn't overclocked, respect max speed depending on config. 2 Quote Link to comment
orem684 Posted January 25, 2020 Author Share Posted January 25, 2020 10 hours ago, johnnie.black said: A btrfs process is crashing but I can't see any other btrfs problem on any of the file systems, so I would first start by running memtest since btrfs is very intolerant of memory errors, also make sure your RAM isn't overclocked, respect max speed depending on config. memtest I took down my OC on the ram to 2400 per the table you posted and ran a memtest that returned as pass with 0 errors. Unfortunately I booted back up to the same issue. The memtest and syslog is attached. Any other thoughts? syslog Quote Link to comment
orem684 Posted January 25, 2020 Author Share Posted January 25, 2020 I also swapped out my ram with some newer sticks from my main computer and still have the same issue. Quote Link to comment
JorgeB Posted January 25, 2020 Share Posted January 25, 2020 4 hours ago, orem684 said: The memtest and syslog is attached. Any other thoughts? Don't see anything out of ordinary on this syslog, it covers the crash? 1 Quote Link to comment
orem684 Posted January 25, 2020 Author Share Posted January 25, 2020 (edited) 11 hours ago, johnnie.black said: Don't see anything out of ordinary on this syslog, it covers the crash? So I think I figured out the issue. I notice that the freezing only started after enabling Docker so I decided to delete my image and build a new one. I brought back two of my containers from the template and so far I've had about 12 hours of uptime with the two containers running. I'm guessing my original Docker image must have been corrupted. Thanks everyone for your help. Edited January 25, 2020 by orem684 Quote Link to comment
EgyptianSnakeLegs Posted January 25, 2020 Share Posted January 25, 2020 Let us know if that continues to be the solution. I'm still having stability issues with my system, and this is something I hadn't considered. Quote Link to comment
orem684 Posted January 26, 2020 Author Share Posted January 26, 2020 4 hours ago, EgyptianSnakeLegs said: Let us know if that continues to be the solution. I'm still having stability issues with my system, and this is something I hadn't considered. So I just did a reboot and got out of safe mode with no issues. I am running three dockers (pi-hole, unifi controller and Plex) with no stability issues and no errors in the logs so far. I'll continue to slowly add my containers back while monitoring for stability. I'll let you know if I see any problems. 1 Quote Link to comment
JorgeB Posted January 26, 2020 Share Posted January 26, 2020 12 hours ago, orem684 said: I decided to delete my image and build a new one. Docker image could have been the issue, I did see a couple of btrs crashes earlier but no specific filesystem was mentioned, still recommend running RAM at non overclock speeds, especially with Ryzen, there have been previous cases here of instability and even data corruption. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.