36ve Posted February 2, 2022 Share Posted February 2, 2022 I have a DL380p G8 with 12 drives installed. Over the last month after a few hours of operation occasionally the server will crash. Checking the iLo it shows: Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 2, Function 0, Error status 0x00000020) Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible I have tried the drives with the USB that contains unraid on another DL380p G8 with complete different hardware. And the same issue still occurs. I also did change the USB as that was an old one and was thought to have been the issue but all of the issues persist. We tried having unraid in Safemode for all testing to ensure only the dockers we have are running with every other plugin/app disabled. Any help much appreciated as i cannot rebuild the current system as i do not have enough drives to lift some of the data off to a new system. galar-diagnostics-20220202-0909.zip Quote Link to comment
36ve Posted February 2, 2022 Author Share Posted February 2, 2022 I have managed to get the Syslog from when the crash happens. Looks like it crashes around: 12:54:17 Any help trying to understand whats happening would be much appreciated. syslog (3) Quote Link to comment
ChatNoir Posted February 2, 2022 Share Posted February 2, 2022 Feb 2 12:54:17 Galar kernel: ------------[ cut here ]------------ Feb 2 12:54:17 Galar kernel: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out Feb 2 12:54:17 Galar kernel: WARNING: CPU: 22 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0xcf/0x12b Feb 2 12:54:17 Galar kernel: Modules linked in: xt_mark veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle nf_tables vhost_net tun vhost vhost_iotlb tap macvlan xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding tg3 sb_edac ipmi_ssif i2c_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd hpsa glue_helper rapl scsi_transport_sas intel_cstate intel_uncore acpi_power_meter ata_piix acpi_ipmi thermal button ipmi_si [last unloaded: tg3] Feb 2 12:54:17 Galar kernel: CPU: 22 PID: 0 Comm: swapper/22 Tainted: G I 5.10.28-Unraid #1 Feb 2 12:54:17 Galar kernel: Hardware name: HP ProLiant DL380p Gen8, BIOS P70 05/24/2019 Feb 2 12:54:17 Galar kernel: RIP: 0010:dev_watchdog+0xcf/0x12b Feb 2 12:54:17 Galar kernel: Code: 79 b7 00 00 75 38 48 89 ef c6 05 63 79 b7 00 01 e8 79 dd fc ff 44 89 e1 48 89 ee 48 c7 c7 ef 7f de 81 48 89 c2 e8 50 16 10 00 <0f> 0b eb 10 41 ff c4 48 05 40 01 00 00 41 39 f4 75 9d eb 16 48 8b Feb 2 12:54:17 Galar kernel: RSP: 0018:ffffc90006898ed8 EFLAGS: 00010286 Feb 2 12:54:17 Galar kernel: RAX: 0000000000000000 RBX: ffff88812613c438 RCX: 0000000000000027 Feb 2 12:54:17 Galar kernel: RDX: 00000000ffffbfff RSI: 0000000000000001 RDI: ffff888a17918920 Feb 2 12:54:17 Galar kernel: RBP: ffff88812613c000 R08: 0000000000000000 R09: 00000000ffffbfff Feb 2 12:54:17 Galar kernel: R10: ffffc90006898d08 R11: ffffc90006898d00 R12: 0000000000000000 Feb 2 12:54:17 Galar kernel: R13: ffffc90006898f10 R14: ffffc90006898f10 R15: ffffffff820060c8 Feb 2 12:54:17 Galar kernel: FS: 0000000000000000(0000) GS:ffff888a17900000(0000) knlGS:0000000000000000 Feb 2 12:54:17 Galar kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 2 12:54:17 Galar kernel: CR2: 0000146096259a24 CR3: 000000000200a001 CR4: 00000000001706e0 Feb 2 12:54:17 Galar kernel: Call Trace: Feb 2 12:54:17 Galar kernel: <IRQ> Feb 2 12:54:17 Galar kernel: call_timer_fn.isra.0+0x12/0x6f Feb 2 12:54:17 Galar kernel: ? netif_tx_lock+0x7a/0x7a Feb 2 12:54:17 Galar kernel: __run_timers.part.0+0x144/0x185 Feb 2 12:54:17 Galar kernel: ? update_process_times+0x68/0x6e Feb 2 12:54:17 Galar kernel: ? hrtimer_forward+0x73/0x7b Feb 2 12:54:17 Galar kernel: ? tick_sched_timer+0x5a/0x64 Feb 2 12:54:17 Galar kernel: ? timerqueue_add+0x62/0x68 Feb 2 12:54:17 Galar kernel: ? recalibrate_cpu_khz+0x1/0x1 Feb 2 12:54:17 Galar kernel: run_timer_softirq+0x21/0x43 Feb 2 12:54:17 Galar kernel: __do_softirq+0xc4/0x1c2 Feb 2 12:54:17 Galar kernel: asm_call_irq_on_stack+0x12/0x20 Feb 2 12:54:17 Galar kernel: </IRQ> Feb 2 12:54:17 Galar kernel: do_softirq_own_stack+0x2c/0x39 Feb 2 12:54:17 Galar kernel: __irq_exit_rcu+0x45/0x80 Feb 2 12:54:17 Galar kernel: sysvec_apic_timer_interrupt+0x87/0x95 Feb 2 12:54:17 Galar kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Feb 2 12:54:17 Galar kernel: RIP: 0010:arch_local_irq_enable+0x7/0x8 Feb 2 12:54:17 Galar kernel: Code: 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 d2 39 c5 Feb 2 12:54:17 Galar kernel: RSP: 0018:ffffc90006377ea0 EFLAGS: 00000246 Feb 2 12:54:17 Galar kernel: RAX: ffff888a17922380 RBX: 0000000000000004 RCX: 000000000000001f Feb 2 12:54:17 Galar kernel: RDX: 0000000000000000 RSI: 000000002dd30fdd RDI: 0000000000000000 Feb 2 12:54:17 Galar kernel: RBP: ffffe8f5feb3fa00 R08: 00000c6a45ea0e72 R09: 000000000000038d Feb 2 12:54:17 Galar kernel: R10: 000000000000038d R11: 071c71c71c71c71c R12: 00000c6a45ea0e72 Feb 2 12:54:17 Galar kernel: R13: ffffffff820c5dc0 R14: 0000000000000004 R15: 0000000000000000 Feb 2 12:54:17 Galar kernel: cpuidle_enter_state+0x101/0x1c4 Feb 2 12:54:17 Galar kernel: cpuidle_enter+0x25/0x31 Feb 2 12:54:17 Galar kernel: do_idle+0x1a6/0x214 Feb 2 12:54:17 Galar kernel: cpu_startup_entry+0x18/0x1a Feb 2 12:54:17 Galar kernel: secondary_startup_64_no_verify+0xb0/0xbb Feb 2 12:54:17 Galar kernel: ---[ end trace ab6d36e9d5980c46 ]--- Quote macvlan I guess that you are using custom IP addresses for one or several docker containers. I would suggest to update to 6.10.0 RC2 and switch your docker network from MACVLAN to IPVLAN (settings/docker). Quote Link to comment
36ve Posted February 3, 2022 Author Share Posted February 3, 2022 I updated Unraid to that version as suggested and the same thing occurred last night. After a few hours it crashed and upon restoring it to get access to the Syslog it shows the same error that you have quoted Quote Link to comment
ChatNoir Posted February 3, 2022 Share Posted February 3, 2022 Did you go change the docker network type to IPVLAN ? It seems to solve the issue for other users. Quote Link to comment
36ve Posted February 3, 2022 Author Share Posted February 3, 2022 Yes i did change it to IPVLAN and the error shows on the syslog at the time of crashing. It could be possible that the NIC is having issues and likely faulty Quote Link to comment
Squid Posted February 3, 2022 Share Posted February 3, 2022 Why are you running with the VM settings set to acs override = downstream if you're not running any VMs with passthrough? The most stable was to set up a server is with ACS disabled unless you absolutely need it. Quote Link to comment
36ve Posted February 3, 2022 Author Share Posted February 3, 2022 32 minutes ago, Squid said: Why are you running with the VM settings set to acs override = downstream if you're not running any VMs with passthrough? The most stable was to set up a server is with ACS disabled unless you absolutely need it. I had removed the vm's i was passing through, as they were no longer used or needed the setting has just been left like this for a while now. I shall try changing it and seeing if that makes a difference. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.