Jump to content

permissionBRICK

Members
  • Content Count

    25
  • Joined

  • Last visited

Community Reputation

0 Neutral

About permissionBRICK

  • Rank
    Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Update: Its looking like I finally managed to fix the issue. The last thing I changed was uninstalling anything network related from Nerd Tools, and updating the rest. The server has been up without issues since the last post.
  2. (not a support request) I made a fork of this lancache repo, and added the ability to define custom DNS entries in the docker config. Its simple and super jank, but If you really want to for example give your NAS a local DNS name, but don't want to spin up a separate DNS server, you can use this 2 in 1 and save resources. Sidenote: I didn't make a pull request since I don't think this is a feature that people would usually want, also the implementation is kinda lazy https://github.com/permissionBRICK/lancache-bundle
  3. I thought the problem was fixed this time when I replaced the 10g card with an SFP+ Card, but it happened again. This time with the SFP+ Card it didn't take down the network, and the server was still accessible initially, however I got the same CPU Stall errors in syslog again, and after a few minutes the server stopped responding again. I have found several other topics and nobody who has encountered these CPU Stall errors seems to have any solution to it despite downgrading, but I have been getting these issues for several versions now, so I have no idea if I can downgrade that far... Also I managed to get a snapshot of netdata when it happened, one of the Cores seems to Stall on SOFTIRQ, while another one stalls on SYSTEM. Anyone got any more Ideas what I can do to try and fix this? Up to now I have replaced every single hardware component except the hard drives, the usb key and the PSU, and the issue persists. Do I need to reinstall Unraid from scratch on a new usb key? syslog.log
  4. Were you running any dockers when it happened or are you not running any at all?
  5. The issue happened again this weekend, at exactly 2020-06-13 17:10 the network went down. This time I found an error in the syslog of the nas on the syslog sync server. The time of the error is about 2 min before the lockdown occured, and seeing it is the only message after hours of nothing, and the lockdown gradually gets worse until it locks up the entire network, this might very well be the cause or at least related to the issue: syslog.log
  6. Can you post the hardware specs of your setup? If it is entirely different than mine maybe we can root out hardware as the cause completely... I have already swapped all components except the drives, the PSU and the unraid install itself on the stick. Can you post your current diagnostics file, so we can compare installed plugins etc? nethub-diagnostics-20200615-1146.zip
  7. This time it happened again, and it locked up the network as well, like usual. I guess if it is the fault of the 10g card it is a driver issue, or else it has nothing to do with the 10g card. The only remaining components that i haven't swapped yet are the drives, the PSU and the unraid install itself.
  8. Now something happened, but I'm not sure if it is the same issue. The network is still up, and the machine is still reachable but all cores except one were stuck on 100% iowait, the mover is running and doesn't seem to make progress. the syslog spits out this error: Jun 5 00:39:29 Nethub shutdown[1443]: shutting down for system reboot Jun 5 00:40:33 Nethub kernel: rcu: INFO: rcu_sched self-detected stall on CPU Jun 5 00:40:33 Nethub kernel: rcu: 5-....: (240002 ticks this GP) idle=566/1/0x4000000000000002 softirq=18902205/18902205 fqs=58516 Jun 5 00:40:33 Nethub kernel: rcu: (t=240004 jiffies g=37787229 q=503044) Jun 5 00:40:33 Nethub kernel: Sending NMI from CPU 5 to CPUs 4: Jun 5 00:40:33 Nethub kernel: NMI backtrace for cpu 4 Jun 5 00:40:33 Nethub kernel: CPU: 4 PID: 30342 Comm: kworker/u16:2 Tainted: G B D W 4.19.107-Unraid #1 Jun 5 00:40:33 Nethub kernel: Hardware name: MSI MS-7A63/Z270 GAMING PRO CARBON (MS-7A63), BIOS 1.90 07/03/2018 Jun 5 00:40:33 Nethub kernel: Workqueue: btrfs-endio-write btrfs_endio_write_helper Jun 5 00:40:33 Nethub kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x11e/0x171 Jun 5 00:40:33 Nethub kernel: Code: 48 03 04 cd 20 37 db 81 48 89 10 8b 42 08 85 c0 75 04 f3 90 eb f5 48 8b 0a 48 85 c9 74 c9 0f 0d 09 8b 07 66 85 c0 74 04 f3 90 <eb> f5 41 89 c0 66 45 31 c0 44 39 c6 74 0a 48 85 c9 c6 07 01 75 1b Jun 5 00:40:33 Nethub kernel: RSP: 0018:ffffc9000ce77908 EFLAGS: 00000202 Jun 5 00:40:33 Nethub kernel: RAX: 0000000000140101 RBX: ffff88880b9a8a00 RCX: 0000000000000000 Jun 5 00:40:33 Nethub kernel: RDX: ffff88884eb20740 RSI: 0000000000140000 RDI: ffff88880b9a8b60 Jun 5 00:40:33 Nethub kernel: RBP: ffff8881076c61a0 R08: 0000000000000005 R09: 0000000000000000 Jun 5 00:40:33 Nethub kernel: R10: ffff88880b9a8b60 R11: ffff88884e405301 R12: ffff888535e4f130 Jun 5 00:40:33 Nethub kernel: R13: ffff8882e09824e0 R14: ffff8888475a2000 R15: 0000000000000000 Jun 5 00:40:33 Nethub kernel: FS: 0000000000000000(0000) GS:ffff88884eb00000(0000) knlGS:0000000000000000 Jun 5 00:40:33 Nethub kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 5 00:40:33 Nethub kernel: CR2: 0000153317b04000 CR3: 0000000001e0a002 CR4: 00000000003606e0 Jun 5 00:40:33 Nethub kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 5 00:40:33 Nethub kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 5 00:40:33 Nethub kernel: Call Trace: Jun 5 00:40:33 Nethub kernel: _raw_spin_lock+0x16/0x19 Jun 5 00:40:33 Nethub kernel: btrfs_add_delayed_tree_ref+0x214/0x2a4 Jun 5 00:40:33 Nethub kernel: btrfs_alloc_tree_block+0x483/0x510 Jun 5 00:40:33 Nethub kernel: alloc_tree_block_no_bg_flush+0x45/0x4d Jun 5 00:40:33 Nethub kernel: __btrfs_cow_block+0x143/0x4ee Jun 5 00:40:33 Nethub kernel: btrfs_cow_block+0x105/0x113 Jun 5 00:40:33 Nethub kernel: btrfs_search_slot+0x330/0x84a Jun 5 00:40:33 Nethub kernel: btrfs_lookup_file_extent+0x47/0x61 Jun 5 00:40:33 Nethub kernel: __btrfs_drop_extents+0x16f/0xb12 Jun 5 00:40:33 Nethub kernel: ? next_state+0x9/0x13 Jun 5 00:40:33 Nethub kernel: ? __set_extent_bit+0x280/0x430 Jun 5 00:40:33 Nethub kernel: insert_reserved_file_extent.constprop.0+0x98/0x2cc Jun 5 00:40:33 Nethub kernel: btrfs_finish_ordered_io+0x317/0x5d2 Jun 5 00:40:33 Nethub kernel: normal_work_helper+0xd0/0x1c7 Jun 5 00:40:33 Nethub kernel: process_one_work+0x16e/0x24f Jun 5 00:40:33 Nethub kernel: worker_thread+0x1e2/0x2b8 Jun 5 00:40:33 Nethub kernel: ? rescuer_thread+0x2a7/0x2a7 Jun 5 00:40:33 Nethub kernel: kthread+0x10c/0x114 Jun 5 00:40:33 Nethub kernel: ? kthread_park+0x89/0x89 Jun 5 00:40:33 Nethub kernel: ret_from_fork+0x35/0x40 Jun 5 00:40:33 Nethub kernel: NMI backtrace for cpu 5 Jun 5 00:40:33 Nethub kernel: CPU: 5 PID: 32176 Comm: kworker/u16:1 Tainted: G B D W 4.19.107-Unraid #1 Jun 5 00:40:33 Nethub kernel: Hardware name: MSI MS-7A63/Z270 GAMING PRO CARBON (MS-7A63), BIOS 1.90 07/03/2018 Jun 5 00:40:33 Nethub kernel: Workqueue: btrfs-endio-write btrfs_endio_write_helper Jun 5 00:40:33 Nethub kernel: Call Trace: Jun 5 00:40:33 Nethub kernel: <IRQ> Jun 5 00:40:33 Nethub kernel: dump_stack+0x67/0x83 Jun 5 00:40:33 Nethub kernel: nmi_cpu_backtrace+0x71/0x83 Jun 5 00:40:33 Nethub kernel: ? lapic_can_unplug_cpu+0x97/0x97 Jun 5 00:40:33 Nethub kernel: nmi_trigger_cpumask_backtrace+0x57/0xd4 Jun 5 00:40:33 Nethub kernel: rcu_dump_cpu_stacks+0x8b/0xb4 Jun 5 00:40:33 Nethub kernel: rcu_check_callbacks+0x296/0x5a0 Jun 5 00:40:33 Nethub kernel: update_process_times+0x24/0x47 Jun 5 00:40:33 Nethub kernel: tick_sched_timer+0x36/0x64 Jun 5 00:40:33 Nethub kernel: __hrtimer_run_queues+0xb7/0x10b Jun 5 00:40:33 Nethub kernel: ? tick_sched_handle.isra.0+0x2f/0x2f Jun 5 00:40:33 Nethub kernel: hrtimer_interrupt+0xf4/0x20e Jun 5 00:40:33 Nethub kernel: smp_apic_timer_interrupt+0x7b/0x93 Jun 5 00:40:33 Nethub kernel: apic_timer_interrupt+0xf/0x20 Jun 5 00:40:33 Nethub kernel: </IRQ> Jun 5 00:40:33 Nethub kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x6b/0x171 Jun 5 00:40:33 Nethub kernel: Code: 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 74 0e 81 e6 00 ff 00 00 75 1a c6 47 01 00 eb 14 85 f6 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c2 40 07 02 00 65 48 03 15 80 6a f8 Jun 5 00:40:33 Nethub kernel: RSP: 0018:ffffc9000ef4f908 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 Jun 5 00:40:33 Nethub kernel: RAX: 0000000000140101 RBX: ffff88880b9a8a00 RCX: 0000000000004000 Jun 5 00:40:33 Nethub kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88880b9a8b60 Jun 5 00:40:33 Nethub kernel: RBP: ffff88880226b8f0 R08: 0000000000000005 R09: 0000000000000000 Jun 5 00:40:33 Nethub kernel: R10: ffff88880b9a8b60 R11: ffff88884e405301 R12: ffff88852d64ced8 Jun 5 00:40:33 Nethub kernel: R13: ffff8881971515b0 R14: ffff8888475a2000 R15: 0000000000000000 Jun 5 00:40:33 Nethub kernel: _raw_spin_lock+0x16/0x19 Jun 5 00:40:33 Nethub kernel: btrfs_add_delayed_tree_ref+0x214/0x2a4 Jun 5 00:40:33 Nethub kernel: btrfs_alloc_tree_block+0x483/0x510 Jun 5 00:40:33 Nethub kernel: alloc_tree_block_no_bg_flush+0x45/0x4d Jun 5 00:40:33 Nethub kernel: __btrfs_cow_block+0x143/0x4ee Jun 5 00:40:33 Nethub kernel: btrfs_cow_block+0x105/0x113 Jun 5 00:40:33 Nethub kernel: btrfs_search_slot+0x330/0x84a Jun 5 00:40:33 Nethub kernel: btrfs_lookup_file_extent+0x47/0x61 Jun 5 00:40:33 Nethub kernel: __btrfs_drop_extents+0x16f/0xb12 Jun 5 00:40:33 Nethub kernel: ? next_state+0x9/0x13 Jun 5 00:40:33 Nethub kernel: ? __set_extent_bit+0x280/0x430 Jun 5 00:40:33 Nethub kernel: insert_reserved_file_extent.constprop.0+0x98/0x2cc Jun 5 00:40:33 Nethub kernel: btrfs_finish_ordered_io+0x317/0x5d2 Jun 5 00:40:33 Nethub kernel: normal_work_helper+0xd0/0x1c7 Jun 5 00:40:33 Nethub kernel: process_one_work+0x16e/0x24f Jun 5 00:40:33 Nethub kernel: worker_thread+0x1e2/0x2b8 Jun 5 00:40:33 Nethub kernel: ? rescuer_thread+0x2a7/0x2a7 Jun 5 00:40:33 Nethub kernel: kthread+0x10c/0x114 Jun 5 00:40:33 Nethub kernel: ? kthread_park+0x89/0x89 Jun 5 00:40:33 Nethub kernel: ret_from_fork+0x35/0x40 Jun 5 00:40:33 Nethub kernel: Sending NMI from CPU 5 to CPUs 6: Jun 5 00:40:33 Nethub kernel: NMI backtrace for cpu 6 Jun 5 00:40:33 Nethub kernel: CPU: 6 PID: 30933 Comm: kworker/u16:0 Tainted: G B D W 4.19.107-Unraid #1 Jun 5 00:40:33 Nethub kernel: Hardware name: MSI MS-7A63/Z270 GAMING PRO CARBON (MS-7A63), BIOS 1.90 07/03/2018 Jun 5 00:40:33 Nethub kernel: Workqueue: btrfs-endio-write btrfs_endio_write_helper Jun 5 00:40:33 Nethub kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x63/0x171 Jun 5 00:40:33 Nethub kernel: Code: 2f 08 b8 00 01 00 00 0f 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 74 0e 81 e6 00 ff 00 00 75 1a c6 47 01 00 eb 14 85 f6 74 0a <8b> 07 84 c0 74 04 f3 90 eb f6 66 c7 07 01 00 c3 48 c7 c2 40 07 02 Jun 5 00:40:33 Nethub kernel: RSP: 0018:ffffc9000d90f9e0 EFLAGS: 00000202 Jun 5 00:40:33 Nethub kernel: RAX: 0000000000000101 RBX: ffff888536993850 RCX: ffffc9000d90fb28 Jun 5 00:40:33 Nethub kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff888536993888 Jun 5 00:40:33 Nethub kernel: RBP: ffff88877e7452f8 R08: ffff88839f210ba0 R09: ffffc9000d90fb2c Jun 5 00:40:33 Nethub kernel: R10: ffff88880b9a8b60 R11: 0000000000000000 R12: ffff88880b9a8b78 Jun 5 00:40:33 Nethub kernel: R13: ffff888536993888 R14: ffffc9000d90fb28 R15: 0000000000000000 Jun 5 00:40:33 Nethub kernel: FS: 0000000000000000(0000) GS:ffff88884eb80000(0000) knlGS:0000000000000000 Jun 5 00:40:33 Nethub kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 5 00:40:33 Nethub kernel: CR2: 0000153317b04000 CR3: 0000000001e0a002 CR4: 00000000003606e0 Jun 5 00:40:33 Nethub kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 5 00:40:33 Nethub kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 5 00:40:33 Nethub kernel: Call Trace: Jun 5 00:40:33 Nethub kernel: _raw_spin_lock+0x16/0x19 Jun 5 00:40:33 Nethub kernel: update_existing_head_ref.isra.0+0x32/0x111 Jun 5 00:40:33 Nethub kernel: add_delayed_ref_head.isra.0+0x102/0x189 Jun 5 00:40:33 Nethub kernel: btrfs_add_delayed_tree_ref+0x231/0x2a4 Jun 5 00:40:33 Nethub kernel: btrfs_free_tree_block+0x86/0x1dd Jun 5 00:40:33 Nethub kernel: __btrfs_cow_block+0x4a0/0x4ee Jun 5 00:40:33 Nethub kernel: btrfs_cow_block+0x105/0x113 Jun 5 00:40:33 Nethub kernel: btrfs_search_slot+0x330/0x84a Jun 5 00:40:33 Nethub kernel: btrfs_lookup_csum+0x4d/0x130 Jun 5 00:40:33 Nethub kernel: ? _cond_resched+0x1b/0x1e Jun 5 00:40:33 Nethub kernel: ? kmem_cache_alloc+0xdf/0xeb Jun 5 00:40:33 Nethub kernel: btrfs_csum_file_blocks+0x8b/0x563 Jun 5 00:40:33 Nethub kernel: add_pending_csums+0x40/0x5b Jun 5 00:40:33 Nethub kernel: btrfs_finish_ordered_io+0x3d2/0x5d2 Jun 5 00:40:33 Nethub kernel: normal_work_helper+0xd0/0x1c7 Jun 5 00:40:33 Nethub kernel: process_one_work+0x16e/0x24f Jun 5 00:40:33 Nethub kernel: worker_thread+0x1e2/0x2b8 Jun 5 00:40:33 Nethub kernel: ? rescuer_thread+0x2a7/0x2a7 Jun 5 00:40:33 Nethub kernel: kthread+0x10c/0x114 Jun 5 00:40:33 Nethub kernel: ? kthread_park+0x89/0x89 Jun 5 00:40:33 Nethub kernel: ret_from_fork+0x35/0x40 I tried to reboot the nas, but then it went to 80% iowait and 20% system fixed on all cores, and even though it logged System reboot NOW, it never rebooted. I tried diagnostics but it never finished generating them. syslog.log
  9. Alright, RAM is officially debunked, the same issue just happened while I had the old 8GB Modules in the system. I now tried to switch the 10G card with the 10G card in my main workstation, lets see if that fixed the issue.
  10. I have been running Memtest on the 32GB RAM Modules on the old board (where the same issue used to occur) for 2 weeks now. One week of memtest86, and one week of Windows Memory Diagnosis on max settings. Both resulted in no errors. I even ran the memtest tool @Benson suggested but again no errors. Here is a picture:
  11. No, sadly it doesn't. The 10G Link was still connected, and it locked up the network again as well. Yeah, I now put back the old 4GB modules into the System, so I have plenty of time to test the RAM for any issues. Thanks
  12. Okay, it happened again, this time while the 1G Link was down. This resulted in network lock-up as well, but once i reset the switch, the network was fine again, but the server was still frozen. So I guess when the server freezes up the 1G does reestablish the link if its lost while the 10g card doesn't, but both lock up the network when it happens. I also found these threads, which seem to be exactly my issue as well: The first one seems to indicate that a memory issue might have been the cause, however i would find it strange if a generic memory defect would cause specific issues this similar with several people... Nevertheless I will try to swap out the ram again, and have the current modules run memtest for a week to be sure...
  13. Yes, there is only one link cable between the two switches. When the nas is plugged into the 1g switch, the link cable between the 1g and 10g switch flashes rapidly during this issue The other ports don't flash as much, so I guess the buffers are overflowing instead of a broadcast storm overloading the network. I don't know enough about data link layer logic to guess why the switches are blinking "getting data" but when I plug the cable directly into the pc, wireshark doesn't show any packets. None of the VMs were started at any point when the issues occured. With Docker containers I guess the only unusual thing would be the lancache docker that I run on br0 with a separate ip, but running ifconfig in that docker also shows a different mac.