DuzAwe

Members
  • Posts

    59
  • Joined

  • Last visited

Posts posted by DuzAwe

  1. I have v2 setup and the node.  Added the nvidia parameters to my node docker(plex works with my 1060Super).  Selected the Winsome H265 NVENC plugin, turned on a gpu worker is its "working" but unraid netdata isnt showing any processes on nvenc.  So I am not quite sure if its actually working on GPU.  I have a cpu worker disabled currently.

  2. Hey,

     

    Woke up this morning to an read-only disk, restarted the array and now I have an Unmountable disk present for a cache pool.

     

    Its btrfs and spinning rust. 

     

    It has a disk ID sdf, I don't know the best course of action. 

  3. Looks like I have had another kernal panic/macvlan crash but no lock up this time. I am able to export diags as a result, hopefully it shows something that can stop this all together. As I said earlier, I don't have any Dockers any more with static custom set ips in my doicker set up.

     

    Apr 18 14:54:38 thelibrary kernel: ------------[ cut here ]------------
    Apr 18 14:54:38 thelibrary kernel: WARNING: CPU: 6 PID: 13151 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack]
    Apr 18 14:54:38 thelibrary kernel: Modules linked in: macvlan nvidia_uvm(PO) veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) nct6775 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding igb i2c_algo_bit ipmi_ssif amd64_edac_mod edac_mce_amd kvm_amd wmi_bmof kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel mpt3sas crypto_simd i2c_piix4 cryptd i2c_core nvme raid_class glue_helper ccp nvme_core scsi_transport_sas rapl ahci wmi acpi_ipmi k10temp libahci button ipmi_si acpi_cpufreq [last unloaded: i2c_algo_bit]
    Apr 18 14:54:38 thelibrary kernel: CPU: 6 PID: 13151 Comm: kworker/6:0 Tainted: P           O      5.10.28-Unraid #1
    Apr 18 14:54:38 thelibrary kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U, BIOS P3.50 11/02/2020
    Apr 18 14:54:38 thelibrary kernel: Workqueue: events macvlan_process_broadcast [macvlan]
    Apr 18 14:54:38 thelibrary kernel: RIP: 0010:__nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack]
    Apr 18 14:54:38 thelibrary kernel: Code: e8 dc f8 ff ff 44 89 fa 89 c6 41 89 c4 48 c1 eb 20 89 df 41 89 de e8 36 f6 ff ff 84 c0 75 bb 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 89 df 44 89 e6 31 db e8 6d f3 ff ff e8 35 f5 ff ff e9 22 01
    Apr 18 14:54:38 thelibrary kernel: RSP: 0018:ffffc9000031cd38 EFLAGS: 00010202
    Apr 18 14:54:38 thelibrary kernel: RAX: 0000000000000188 RBX: 0000000000006b19 RCX: 00000000455a00ea
    Apr 18 14:54:38 thelibrary kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa01e8e64
    Apr 18 14:54:38 thelibrary kernel: RBP: ffff8881c977ea80 R08: 000000001f21a935 R09: ffff8881965b4c20
    Apr 18 14:54:38 thelibrary kernel: R10: 0000000000000158 R11: ffff888195fdbf00 R12: 0000000000008cdd
    Apr 18 14:54:38 thelibrary kernel: R13: ffffffff8210b440 R14: 0000000000006b19 R15: 0000000000000000
    Apr 18 14:54:38 thelibrary kernel: FS:  0000000000000000(0000) GS:ffff8887fe980000(0000) knlGS:0000000000000000
    Apr 18 14:54:38 thelibrary kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Apr 18 14:54:38 thelibrary kernel: CR2: 000015146e169000 CR3: 0000000246280000 CR4: 0000000000350ee0
    Apr 18 14:54:38 thelibrary kernel: Call Trace:
    Apr 18 14:54:38 thelibrary kernel: <IRQ>
    Apr 18 14:54:38 thelibrary kernel: nf_conntrack_confirm+0x2f/0x36 [nf_conntrack]
    Apr 18 14:54:38 thelibrary kernel: nf_hook_slow+0x39/0x8e
    Apr 18 14:54:38 thelibrary kernel: nf_hook.constprop.0+0xb1/0xd8
    Apr 18 14:54:38 thelibrary kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe
    Apr 18 14:54:38 thelibrary kernel: ip_local_deliver+0x49/0x75
    Apr 18 14:54:38 thelibrary kernel: ip_sabotage_in+0x43/0x4d [br_netfilter]
    Apr 18 14:54:38 thelibrary kernel: nf_hook_slow+0x39/0x8e
    Apr 18 14:54:38 thelibrary kernel: nf_hook.constprop.0+0xb1/0xd8
    Apr 18 14:54:38 thelibrary kernel: ? l3mdev_l3_rcv.constprop.0+0x50/0x50
    Apr 18 14:54:38 thelibrary kernel: ip_rcv+0x41/0x61
    Apr 18 14:54:38 thelibrary kernel: __netif_receive_skb_one_core+0x74/0x95
    Apr 18 14:54:38 thelibrary kernel: process_backlog+0xa3/0x13b
    Apr 18 14:54:38 thelibrary kernel: net_rx_action+0xf4/0x29d
    Apr 18 14:54:38 thelibrary kernel: __do_softirq+0xc4/0x1c2
    Apr 18 14:54:38 thelibrary kernel: asm_call_irq_on_stack+0x12/0x20
    Apr 18 14:54:38 thelibrary kernel: </IRQ>
    Apr 18 14:54:38 thelibrary kernel: do_softirq_own_stack+0x2c/0x39
    Apr 18 14:54:38 thelibrary kernel: do_softirq+0x3a/0x44
    Apr 18 14:54:38 thelibrary kernel: netif_rx_ni+0x1c/0x22
    Apr 18 14:54:38 thelibrary kernel: macvlan_broadcast+0x10e/0x13c [macvlan]
    Apr 18 14:54:38 thelibrary kernel: macvlan_process_broadcast+0xf8/0x143 [macvlan]
    Apr 18 14:54:38 thelibrary kernel: process_one_work+0x13c/0x1d5
    Apr 18 14:54:38 thelibrary kernel: worker_thread+0x18b/0x22f
    Apr 18 14:54:38 thelibrary kernel: ? process_scheduled_works+0x27/0x27
    Apr 18 14:54:38 thelibrary kernel: kthread+0xe5/0xea
    Apr 18 14:54:38 thelibrary kernel: ? __kthread_bind_mask+0x57/0x57
    Apr 18 14:54:38 thelibrary kernel: ret_from_fork+0x22/0x30
    Apr 18 14:54:38 thelibrary kernel: ---[ end trace d16416a764eaff38 ]---

     

    thelibrary-diagnostics-20210418-2211.zip

  4. So on and off since my jump to 6.9-RC2 I have had a freeze issue. I have replaced the mother board and ram in that time. Ram is now ECC, looking at the logs in my understanding it looks like a drive crash? Either MacVlan or Nvidia.

     

    Syslog attached, Help is much appreciated. Server must be hard reset to get any access so diags arent possible.

     

    Apr 18 04:13:38 thelibrary kernel: NETDEV WATCHDOG: eth1 (igb): transmit queue 2 timed out
    Apr 18 04:13:38 thelibrary kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0xcf/0x12b
    Apr 18 04:13:38 thelibrary kernel: Modules linked in: macvlan md_mod nvidia_uvm(PO) veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) nct6775 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding igb i2c_algo_bit ipmi_ssif amd64_edac_mod edac_mce_amd kvm_amd kvm wmi_bmof crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd mpt3sas cryptd nvme nvme_core ccp ahci i2c_piix4 wmi raid_class glue_helper scsi_transport_sas rapl k10temp i2c_core acpi_ipmi libahci button ipmi_si acpi_cpufreq [last unloaded: md_mod]

     

    syslog

  5. 1 minute ago, optiman said:

    That is awesome to hear!  I think it may have to do with which controller is in use.  Mine is a LSI 9305-24i x8.  I see you are also running LSI card.

     

    Weird how some are having the issue and some are not.

    Its just one of those things I guess with so many variables in set up it could be anything causing the issue to express for some and not others. I had (hoping its fixed now) with the Vlan issue locking up my box every few days. 

     

    If it helps at all firmware on my LSI is the most recent 20 something I think. Need to update my sig I also swapped to ECC memory in the last few weeks due to issues with BTRFS.

  6. Well I feel like a fool. Ok Looks clean.

     

    Opening filesystem to check...
    Checking filesystem on /dev/nvme1n1p1
    UUID: cdb12f2a-8005-48a1-b8f7-bd0e1fc9fd43
    [1/7] checking root items
    [2/7] checking extents
    [3/7] checking free space tree
    [4/7] checking fs roots
    [5/7] checking only csums items (without verifying data)
    [6/7] checking root refs
    [7/7] checking quota groups skipped (not enabled on this FS)
    found 266705154048 bytes used, no error found
    total csum bytes: 225909672
    total tree bytes: 698073088
    total fs tree bytes: 379420672
    total extent tree bytes: 46546944
    btree space waste bytes: 119030509
    file data blocks allocated: 2601829400576
     referenced 262717333504

     

  7. 1 minute ago, JorgeB said:

    Also, a scrub only checks data and metadata consistency, it doesn't look for filesystem corruption you can do that with the pool offline by typing:

     

    
    btrfs check /dev/sdX1

     

    Note that if errors are found running btrfs check in repair model (btrfs check --repair) is considered dangerous, should only be done if told so.

     

    Comes back with:

    root@thelibrary:~# btrfs check /dev/sdX1
    Opening filesystem to check...
    ERROR: mount check: cannot open /dev/sdX1: No such file or directory
    ERROR: could not check mount status: No such file or directory