January 23, 20206 yr Earlier today I was using Wireguard to remotely access my LAN and my connection became unresponsive. When I arrived home I noticed that my CPU was showing as overloaded (almost all cores at about 100%), Docker and VMs were disabled and all shares are inaccessible. I attempted a couple of hard shutdowns but Unraid boots back up to the same state. Here is the section of the logs where an error first appears : Jan 22 17:46:17 Tower kernel: ------------[ cut here ]------------ Jan 22 17:46:17 Tower kernel: kernel BUG at fs/btrfs/ctree.c:3246! Jan 22 17:46:17 Tower kernel: invalid opcode: 0000 [#1] SMP NOPTI Jan 22 17:46:17 Tower kernel: CPU: 1 PID: 86 Comm: kworker/u64:3 Not tainted 4.19.94-Unraid #1 Jan 22 17:46:17 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570M Pro4, BIOS P1.50 07/15/2019 Jan 22 17:46:17 Tower kernel: Workqueue: btrfs-endio-write btrfs_endio_write_helper Jan 22 17:46:17 Tower kernel: RIP: 0010:btrfs_set_item_key_safe+0xc0/0x136 Jan 22 17:46:17 Tower kernel: Code: 00 4c 89 ef 48 8d 74 24 07 48 63 d2 48 6b d2 19 48 83 c2 65 e8 78 16 04 00 48 89 de 48 8d 7c 24 07 e8 95 f4 ff ff 85 c0 7f 02 <0f> 0b 48 8b 43 09 49 63 d4 b9 11 00 00 00 4c 89 ef 48 6b d2 19 48 Jan 22 17:46:17 Tower kernel: RSP: 0018:ffffc90001d6fbc0 EFLAGS: 00010286 Jan 22 17:46:17 Tower kernel: RAX: 00000000ffffffff RBX: ffffc90001d6fca5 RCX: 000000000000006c Jan 22 17:46:17 Tower kernel: RDX: 0000000000000000 RSI: ffffc90001d6fca5 RDI: ffffc90001d6fb9f Jan 22 17:46:17 Tower kernel: RBP: ffff8882f9ddeaf0 R08: 0000000000001000 R09: 0000160000000000 Jan 22 17:46:17 Tower kernel: R10: ffff888000000000 R11: 0000000000000000 R12: 000000000000003f Jan 22 17:46:17 Tower kernel: R13: ffff8882ddf7e9d8 R14: 00000000000032c0 R15: ffff8883cd5f8000 Jan 22 17:46:17 Tower kernel: FS: 0000000000000000(0000) GS:ffff88840e640000(0000) knlGS:0000000000000000 Jan 22 17:46:17 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 22 17:46:17 Tower kernel: CR2: 000056445107a5a4 CR3: 000000033b2e2000 CR4: 00000000003406e0 Jan 22 17:46:17 Tower kernel: Call Trace: Jan 22 17:46:17 Tower kernel: __btrfs_drop_extents+0x5e2/0xb12 Jan 22 17:46:17 Tower kernel: insert_reserved_file_extent.constprop.0+0x98/0x2cc Jan 22 17:46:17 Tower kernel: btrfs_finish_ordered_io+0x317/0x5d2 Jan 22 17:46:17 Tower kernel: normal_work_helper+0xd0/0x1c7 Jan 22 17:46:17 Tower kernel: process_one_work+0x16e/0x24f Jan 22 17:46:17 Tower kernel: worker_thread+0x1e2/0x2b8 Jan 22 17:46:17 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 Jan 22 17:46:17 Tower kernel: kthread+0x10c/0x114 Jan 22 17:46:17 Tower kernel: ? kthread_park+0x89/0x89 Jan 22 17:46:17 Tower kernel: ret_from_fork+0x22/0x40 Jan 22 17:46:17 Tower kernel: Modules linked in: veth xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap macvlan xt_nat ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod nct6775 hwmon_vid wireguard ip6_udp_tunnel udp_tunnel bonding edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc igb aesni_intel aes_x86_64 crypto_simd cryptd k10temp i2c_piix4 i2c_algo_bit glue_helper wmi_bmof i2c_core ccp ahci libahci wmi button pcc_cpufreq acpi_cpufreq Jan 22 17:46:17 Tower kernel: ---[ end trace 69d3dbdcd1db1e30 ]--- Jan 22 17:46:17 Tower kernel: RIP: 0010:btrfs_set_item_key_safe+0xc0/0x136 Jan 22 17:46:17 Tower kernel: Code: 00 4c 89 ef 48 8d 74 24 07 48 63 d2 48 6b d2 19 48 83 c2 65 e8 78 16 04 00 48 89 de 48 8d 7c 24 07 e8 95 f4 ff ff 85 c0 7f 02 <0f> 0b 48 8b 43 09 49 63 d4 b9 11 00 00 00 4c 89 ef 48 6b d2 19 48 Jan 22 17:46:17 Tower kernel: RSP: 0018:ffffc90001d6fbc0 EFLAGS: 00010286 Jan 22 17:46:17 Tower kernel: RAX: 00000000ffffffff RBX: ffffc90001d6fca5 RCX: 000000000000006c Jan 22 17:46:17 Tower kernel: RDX: 0000000000000000 RSI: ffffc90001d6fca5 RDI: ffffc90001d6fb9f Jan 22 17:46:17 Tower kernel: RBP: ffff8882f9ddeaf0 R08: 0000000000001000 R09: 0000160000000000 Jan 22 17:46:17 Tower kernel: R10: ffff888000000000 R11: 0000000000000000 R12: 000000000000003f Jan 22 17:46:17 Tower kernel: R13: ffff8882ddf7e9d8 R14: 00000000000032c0 R15: ffff8883cd5f8000 Jan 22 17:46:17 Tower kernel: FS: 0000000000000000(0000) GS:ffff88840e640000(0000) knlGS:0000000000000000 Jan 22 17:46:17 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 22 17:46:17 Tower kernel: CR2: 000056445107a5a4 CR3: 000000033b2e2000 CR4: 00000000003406e0 Jan 22 17:46:54 Tower webGUI: Successful login user root from 192.168.1.81 Jan 22 17:47:04 Tower sSMTP[22829]: Creating SSL connection to host Jan 22 17:47:04 Tower sSMTP[22829]: SSL connection using TLS_AES_256_GCM_SHA384 Jan 22 17:47:06 Tower sSMTP[22829]: Sent mail for ****** (221 2.0.0 closing connection y197sm201426pfc.79 - gsmtp) uid=0 username=root outbytes=665 Jan 22 17:47:10 Tower emhttpd: req (1): csrf_token=****************&title=System Log&cmd=/webGui/scripts/tail_log&arg1=syslog This is my first time posting here so please let me know if i should include additional information.
January 23, 20206 yr 1 hour ago, orem684 said: This is my first time posting here so please let me know if i should include additional information. Post the entire diagnostics.zip file (Tools - Diagnostics)
January 23, 20206 yr Author 4 minutes ago, Squid said: Post the entire diagnostics.zip file (Tools - Diagnostics) I was able to start into safe mode to get the diagnostics.zip. I also found out that when I start the array again in safe mode I go back to the unresponsive state. But when I disable Docker before starting the array, I am able to get the shares back up and access most settings. I am currently running a parity check in safe mode. tower-diagnostics-20200122-1926.zip
January 23, 20206 yr 44 minutes ago, orem684 said: I was able to start into safe mode to get the diagnostics.zip. I also found out that when I start the array again in safe mode I go back to the unresponsive state. But when I disable Docker before starting the array, I am able to get the shares back up and access most settings. I am currently running a parity check in safe mode. tower-diagnostics-20200122-1926.zip 79.72 kB · 1 download Ultimately what we're looking for is what happens when the array starts. Configure the syslog server to mirror the syslog to the flash https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-781601 and start the array. If it's unresponsive after a few minutes, then you'll have to reboot and then post the syslog.txt file that'll be in the logs folder on the flash drive.
January 24, 20206 yr Author Thank you for the follow up, here is the syslog.txt file. Unraid continues to freeze whenever I start the array in safe mode with Docker enabled. I performed the restart at around line 70 (13:24). syslog Edited January 24, 20206 yr by orem684
January 24, 20206 yr 2 hours ago, orem684 said: Is there a way to revert back to 6.7.2? https://s3.amazonaws.com/dnld.lime-technology.com/stable/unRAIDServer-6.7.2-x86_64.zip There is the 6.7.2 zip^^ from the Unraid website.
January 24, 20206 yr Author 6 minutes ago, SpencerJ said: https://s3.amazonaws.com/dnld.lime-technology.com/stable/unRAIDServer-6.7.2-x86_64.zip There is the 6.7.2 zip^^ from the Unraid website. Thank you. Is there a command to install the update or would I have to flash the USB again? Does this affect my license?
January 24, 20206 yr 2 minutes ago, orem684 said: Thank you. Is there a command to install the update or would I have to flash the USB again? Does this affect my license? Simply copy all the bz* files to your flash drive and reboot. No effect on the license
January 24, 20206 yr Author 1 hour ago, SpencerJ said: Simply copy all the bz* files to your flash drive and reboot. No effect on the license I'm assuming I would also delete the existing bz* files on the flash drive?
January 24, 20206 yr Community Expert Simply overwrite them with the ones form the v6.7.2 zip and reboot.
January 24, 20206 yr Author So I reverted back to 6.7.2 and I am still experiencing the same issue. I've attached the new syslog after my last reboot. Any thoughts on what to try next? syslog
January 24, 20206 yr Community Expert A btrfs process is crashing but I can't see any other btrfs problem on any of the file systems, so I would first start by running memtest since btrfs is very intolerant of memory errors, also make sure your RAM isn't overclocked, respect max speed depending on config.
January 25, 20206 yr Author 10 hours ago, johnnie.black said: A btrfs process is crashing but I can't see any other btrfs problem on any of the file systems, so I would first start by running memtest since btrfs is very intolerant of memory errors, also make sure your RAM isn't overclocked, respect max speed depending on config. memtest I took down my OC on the ram to 2400 per the table you posted and ran a memtest that returned as pass with 0 errors. Unfortunately I booted back up to the same issue. The memtest and syslog is attached. Any other thoughts? syslog
January 25, 20206 yr Author I also swapped out my ram with some newer sticks from my main computer and still have the same issue.
January 25, 20206 yr Community Expert 4 hours ago, orem684 said: The memtest and syslog is attached. Any other thoughts? Don't see anything out of ordinary on this syslog, it covers the crash?
January 25, 20206 yr Author 11 hours ago, johnnie.black said: Don't see anything out of ordinary on this syslog, it covers the crash? So I think I figured out the issue. I notice that the freezing only started after enabling Docker so I decided to delete my image and build a new one. I brought back two of my containers from the template and so far I've had about 12 hours of uptime with the two containers running. I'm guessing my original Docker image must have been corrupted. Thanks everyone for your help. Edited January 25, 20206 yr by orem684
January 25, 20206 yr Let us know if that continues to be the solution. I'm still having stability issues with my system, and this is something I hadn't considered.
January 26, 20206 yr Author 4 hours ago, EgyptianSnakeLegs said: Let us know if that continues to be the solution. I'm still having stability issues with my system, and this is something I hadn't considered. So I just did a reboot and got out of safe mode with no issues. I am running three dockers (pi-hole, unifi controller and Plex) with no stability issues and no errors in the logs so far. I'll continue to slowly add my containers back while monitoring for stability. I'll let you know if I see any problems.
January 26, 20206 yr Community Expert 12 hours ago, orem684 said: I decided to delete my image and build a new one. Docker image could have been the issue, I did see a couple of btrs crashes earlier but no specific filesystem was mentioned, still recommend running RAM at non overclock speeds, especially with Ryzen, there have been previous cases here of instability and even data corruption.
Archived
This topic is now archived and is closed to further replies.