random crashes unraid 6.3.5

December 21, 20178 yr

Once again my tower locked up, this time while it was streaming Plex content to my appletv...

Every time this happens - which are alarmingly frequent after upgrading to 6.3.5 - I lose control on my dedicated console keyboard and screen, as well as the network dropping.

After a hard reboot I get my array up and running again, I find this in the Syslog:

Dec 21 16:39:42 mothership kernel: ------------[ cut here ]------------
Dec 21 16:39:42 mothership kernel: WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x181/0x1dc
Dec 21 16:39:42 mothership kernel: NETDEV WATCHDOG: eth0 (e1000): transmit queue 0 timed out
Dec 21 16:39:42 mothership kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables vhost_net tun vhost macvtap macvlan xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod mxm_wmi i2c_i801 i2c_smbus x86_pkg_temp_thermal coretemp i2c_core kvm_intel kvm ahci e1000 libahci video wmi backlight [last unloaded: md_mod]
Dec 21 16:39:42 mothership kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.9.30-unRAID #1
Dec 21 16:39:42 mothership kernel: Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F22c 12/01/2017
Dec 21 16:39:42 mothership kernel: ffff88084ed83db0 ffffffff813a4a1b ffff88084ed83e00 ffffffff819aa12f
Dec 21 16:39:42 mothership kernel: ffff88084ed83df0 ffffffff8104d0d9 0000013c4ed83e68 ffff880826ff0000
Dec 21 16:39:42 mothership kernel: ffff880826d2a800 ffff880826ff03a0 0000000000000003 0000000000000001
Dec 21 16:39:42 mothership kernel: Call Trace:
Dec 21 16:39:42 mothership kernel: <IRQ> 
Dec 21 16:39:42 mothership kernel: [<ffffffff813a4a1b>] dump_stack+0x61/0x7e
Dec 21 16:39:42 mothership kernel: [<ffffffff8104d0d9>] __warn+0xb8/0xd3
Dec 21 16:39:42 mothership kernel: [<ffffffff8104d13a>] warn_slowpath_fmt+0x46/0x4e
Dec 21 16:39:42 mothership kernel: [<ffffffff815a848d>] dev_watchdog+0x181/0x1dc
Dec 21 16:39:42 mothership kernel: [<ffffffff815a830c>] ? qdisc_rcu_free+0x39/0x39
Dec 21 16:39:42 mothership kernel: [<ffffffff815a830c>] ? qdisc_rcu_free+0x39/0x39
Dec 21 16:39:42 mothership kernel: [<ffffffff81090ccc>] call_timer_fn.isra.5+0x17/0x6b
Dec 21 16:39:42 mothership kernel: [<ffffffff81090da5>] expire_timers+0x85/0x98
Dec 21 16:39:42 mothership kernel: [<ffffffff81090ea5>] run_timer_softirq+0x69/0x8f
Dec 21 16:39:42 mothership kernel: [<ffffffff8103642b>] ? lapic_next_deadline+0x21/0x27
Dec 21 16:39:42 mothership kernel: [<ffffffff8109b347>] ? clockevents_program_event+0xd0/0xe8
Dec 21 16:39:42 mothership kernel: [<ffffffff81050f59>] __do_softirq+0xbb/0x1af
Dec 21 16:39:42 mothership kernel: [<ffffffff810511fd>] irq_exit+0x53/0x94
Dec 21 16:39:42 mothership kernel: [<ffffffff81036e19>] smp_trace_apic_timer_interrupt+0x7b/0x88
Dec 21 16:39:42 mothership kernel: [<ffffffff81036e2f>] smp_apic_timer_interrupt+0x9/0xb
Dec 21 16:39:42 mothership kernel: [<ffffffff81680172>] apic_timer_interrupt+0x82/0x90
Dec 21 16:39:42 mothership kernel: <EOI> 
Dec 21 16:39:42 mothership kernel: [<ffffffff815533e4>] ? cpuidle_enter_state+0xfe/0x156
Dec 21 16:39:42 mothership kernel: [<ffffffff8155345e>] cpuidle_enter+0x12/0x14
Dec 21 16:39:42 mothership kernel: [<ffffffff8107c545>] call_cpuidle+0x33/0x35
Dec 21 16:39:42 mothership kernel: [<ffffffff8107c727>] cpu_startup_entry+0x13a/0x1b2
Dec 21 16:39:42 mothership kernel: [<ffffffff81035482>] start_secondary+0xf5/0xf8
Dec 21 16:39:42 mothership kernel: ---[ end trace 83c675411c5afb19 ]---

Also, just below this message keeps occuring:

Dec 21 16:39:42 mothership kernel: e1000 0000:04:01.0 eth0: Reset adapter
Dec 21 16:39:42 mothership kernel: br0: port 1(eth0) entered disabled state
Dec 21 16:39:42 mothership kernel: br0: topology change detected, propagating
Dec 21 16:39:46 mothership kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Dec 21 16:39:46 mothership kernel: br0: port 1(eth0) entered blocking state
Dec 21 16:39:46 mothership kernel: br0: port 1(eth0) entered listening state
Dec 21 16:39:46 mothership kernel: dmar_fault: 215 callbacks suppressed
Dec 21 16:39:46 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:46 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ffc40000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:48 mothership kernel: br0: port 1(eth0) entered learning state
Dec 21 16:39:48 mothership kernel: br0: port 1(eth0) entered forwarding state
Dec 21 16:39:48 mothership kernel: br0: topology change detected, sending tcn bpdu
Dec 21 16:39:48 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:48 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ffcde000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:48 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:48 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ffc20000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:48 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:48 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ffd74000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:48 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:48 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ffc4c000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:48 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:48 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr fff90000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:48 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:48 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ff61b000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:48 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:48 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ffc2b000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:48 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:48 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ff5f1000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:48 mothership kernel: DMAR: DRHD: handling fault status reg 3
Dec 21 16:39:48 mothership kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ffde6000 [fault reason 06] PTE Read access is not set
Dec 21 16:39:51 mothership kernel: dmar_fault: 4749 callbacks suppressed

Whether it is relevant to my troubles, I do not know.

I've just about had it with 6.3.5. If anyone would care to give me some pointers on how to get to the bottom of this, I am all ears.
If downgrading to 6.1.9 (which I believe was automatically backup up onto my UnRaid-USB stick on upgrade) is an option, I'd also appreciate a guide on how to go about doing it.

Quote

December 21, 20178 yr

Community Expert

There's another user with a Gigabyte board and similar errors, so most likely board related, you could try v6.4 which uses a newer kernel.

Quote

December 21, 20178 yr

@blodfjert I have had similar issues and upgraded to v6.4, click here for details, this morning which seems to have stopped the [DMA Read[ issues but the system is still hanging. I am going to change to a LSI h/d controller and see if this solves the issue as the SupermIcro Marvel controller that I presently use is an issue.

I am re-enabling VT-d on the motherboard to see if this solves the new issue before purchasing a new controller.

Quote

December 21, 20178 yr

Author

Hi johnnie.black and trinikojak

This is most interesting! The reason to our problems may very well be the same.
Just for reference, this is my build:

Model: Own build

M/B: Gigabyte Technology Co., Ltd. - Z170XP-SLI-CF

CPU: Intel® Core™ i5-6600 CPU @ 3.30GHz

HVM: Enabled

IOMMU: Enabled

Cache: 256 kB, 1024 kB, 6144 kB

Memory: 32 GB (max. installable capacity 64 GB)

Network: eth0: 1000 Mb/s, full duplex, mtu 1500

Kernel: Linux 4.9.30-unRAID x86_64

OpenSSL: 1.0.2k

When reading that you have upgraded to v6.4 - yet still encounter instability - I'd rather go the other way; downgrading.
As I stated in my first post, my system started acting very unstable the moment I did the upgrade from 6.1.9 to 6.3.5.
Prior to upgrading, it was rock stable (with the exact same hardware, except for the GPU which was switched from Ati HD Radeon 5850 to EVGA GeForce 1060 3GB the same day). My system would have an uptime of several months at the time, running dockers and VMs with passthrough and whatnot.

Also, to my knowledge there are no "Marvel" chipset/controllers on my motherboard. I may be wrong.

I'll se if I can find any information on downgrading to 6.1.9 without jeopardizing my setup.

Let's keep eachother up to speed on the development

Quote

random crashes unraid 6.3.5

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)