Since upgrading to 6.9 (and now 6.9.1), server crashes every 2 or 3 days


Recommended Posts

System details

Unraid 6.9.1

Gigabyte B365M motherboard

Intel i7-8700 CPU

48 GB DDR4 RAM 2666mhz (mismatched sticks/brands)

LSI 9201-8i flashed to P20 IT mode.

 

System was stable on 6.8.3 for months.  Since upgrading to 6.9, and now 6.9.1, the system crashes and becomes completely unresponsive every few days (no GUI access, no SSH access, all dockers down, no network), requiring a hard power off to reboot it and get it running again.

 

I set up an external syslog to capture what happens before the system becomes unresponsive, below.  Also, attached are the diagnostic logs after hard power down and booting back up.

 

    Mar 14 16:38:32 Unraid kernel: <IRQ>
    Mar 14 16:38:32 Unraid kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.21-Unraid #1
    Mar 14 16:38:32 Unraid kernel: CR2: 000000c000853000 CR3: 000000000200c001 CR4: 00000000003726e0
    Mar 14 16:38:32 Unraid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Mar 14 16:38:32 Unraid kernel: Call Trace:
    Mar 14 16:38:32 Unraid kernel: Code: 6c a0 00 00 41 56 45 31 f6 41 55 41 89 d5 41 54 55 48 89 fd 48 89 f7 53 4c 8b 66 10 31 db 49 81 e4 00 f0 ff ff 45 39 ee 7d 28 <48> 8b 47 10 41 ff c6 8b 57 18 25 ff 0f 00 00 48 8d 84 10 ff 0f 00
    Mar 14 16:38:32 Unraid kernel: DMAR: DRHD: handling fault status reg 3
    Mar 14 16:38:32 Unraid kernel: DMAR: DRHD: handling fault status reg 3
    Mar 14 16:38:32 Unraid kernel: DMAR: DRHD: handling fault status reg 3
    Mar 14 16:38:32 Unraid kernel: DMAR: DRHD: handling fault status reg 3
    Mar 14 16:38:32 Unraid kernel: DMAR: [DMA Write] Request device [0b:00.0] PASID ffffffff fault addr f3f63000 [fault reason 05] PTE Write access is not set
    Mar 14 16:38:32 Unraid kernel: DMAR: [DMA Write] Request device [0b:00.0] PASID ffffffff fault addr f3f65000 [fault reason 05] PTE Write access is not set
    Mar 14 16:38:32 Unraid kernel: DMAR: [DMA Write] Request device [0b:00.0] PASID ffffffff fault addr f3f66000 [fault reason 05] PTE Write access is not set
    Mar 14 16:38:32 Unraid kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Mar 14 16:38:32 Unraid kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Mar 14 16:38:32 Unraid kernel: FS: 0000000000000000(0000) GS:ffff8886172c0000(0000) knlGS:0000000000000000
    Mar 14 16:38:32 Unraid kernel: Hardware name: Gigabyte Technology Co., Ltd. B365M DS3H/B365M DS3H, BIOS F5 08/13/2019
    Mar 14 16:38:32 Unraid kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 2abf9cd8898ea000
    Mar 14 16:38:32 Unraid kernel: R13: 0000000000000010 R14: 0000000000000002 R15: 000000000000008c
    Mar 14 16:38:32 Unraid kernel: RAX: b61397b86406a014 RBX: 000000000017710a RCX: 0000000000000002
    Mar 14 16:38:32 Unraid kernel: RBP: ffff888101be30b8 R08: 0000000000000000 R09: ffff8883b2291000
    Mar 14 16:38:32 Unraid kernel: RDX: b61397b86406a015 RSI: ffff8883b2291000 RDI: b61397b86406a014
    Mar 14 16:38:32 Unraid kernel: RIP: 0010:intel_unmap_sg+0x26/0x68
    Mar 14 16:38:32 Unraid kernel: RSP: 0018:ffffc900001a4ec8 EFLAGS: 00010083
    Mar 14 16:38:32 Unraid kernel: __handle_irq_event_percpu+0x36/0xcb
    Mar 14 16:38:32 Unraid kernel: blk_update_request: I/O error, dev nvme0n1, sector 1645609472 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
    Mar 14 16:38:32 Unraid kernel: general protection fault, probably for non-canonical address 0xb61397b86406a014: 0000 [#1] SMP PTI
    Mar 14 16:38:32 Unraid kernel: handle_irq_event_percpu+0x2c/0x6f
    Mar 14 16:38:32 Unraid kernel: nvme_irq+0xb/0x17 [nvme]
    Mar 14 16:38:32 Unraid kernel: nvme_pci_complete_rq+0x56/0x61 [nvme]
    Mar 14 16:38:32 Unraid kernel: nvme_process_cq+0xdb/0x15b [nvme]
    Mar 14 16:38:32 Unraid kernel: nvme_unmap_data+0x51/0xae [nvme]    

Any guidance on a solution would be greatly appreciated! 

unraid-diagnostics-20210314-1716.zip

Edited by lostinspace
Link to comment

I have similar issue. Unfortunately I can't get into the WebGUI or SSH. The server is basically appears to be DEAD.

 

I couldn't even get any diagnostics. Is there any way of getting the diagnostics if I don't have access to WebGUI or SSH?

 

My video card is stubbed with 'System Tools' so Unraid with GUI access from boot doesn't even show anything on the monitor when plugged in directly onto Unraid. What a mess!

 

Unraid version 6.9.1

ASRock X570 / Ryzen 3600 / 32GB ECC RAM

Edited by sfaruque
Link to comment
  • lostinspace changed the title to Since upgrading to 6.9 (and now 6.9.1), server crashes every 2 or 3 days

Crashed again.  External syslog below, diagnostics (after hard reset) attached

Mar 23 03:44:27 Unraid kernel: #PF: error_code(0x0002) - not-present page
Mar 23 03:44:27 Unraid kernel: #PF: supervisor write access in kernel mode
Mar 23 03:44:27 Unraid kernel: ------------[ cut here ]------------
Mar 23 03:44:27 Unraid kernel: ---[ end trace 4d3dcddc45e38db6 ]---
Mar 23 03:44:27 Unraid kernel: <IRQ>
Mar 23 03:44:27 Unraid kernel: ? __kthread_bind_mask+0x57/0x57
Mar 23 03:44:27 Unraid kernel: ? process_scheduled_works+0x27/0x27
Mar 23 03:44:27 Unraid kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Mar 23 03:44:27 Unraid kernel: CPU: 1 PID: 1510 Comm: kworker/1:1H Not tainted 5.10.21-Unraid #1
Mar 23 03:44:27 Unraid kernel: CPU: 1 PID: 1510 Comm: kworker/1:1H Tainted: G D 5.10.21-Unraid #1
Mar 23 03:44:27 Unraid kernel: CR2: 0000000000000000
Mar 23 03:44:27 Unraid kernel: CR2: 0000000000000000 CR3: 000000000200c005 CR4: 00000000003726e0
Mar 23 03:44:27 Unraid kernel: CR2: 0000000000000000 CR3: 000000000200c005 CR4: 00000000003726e0
Mar 23 03:44:27 Unraid kernel: CR2: 0000000000000000 CR3: 000000000200c005 CR4: 00000000003726e0
Mar 23 03:44:27 Unraid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 23 03:44:27 Unraid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 23 03:44:27 Unraid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 23 03:44:27 Unraid kernel: Call Trace:
Mar 23 03:44:27 Unraid kernel: Call Trace:
Mar 23 03:44:27 Unraid kernel: Code: 05 c8 97 e2 00 01 e8 e7 6f 3e 00 0f 0b c3 80 3d b8 97 e2 00 00 75 53 48 c7 c7 52 63 da 81 c6 05 a8 97 e2 00 01 e8 c8 6f 3e 00 <0f> 0b c3 80 3d 98 97 e2 00 00 75 34 48 c7 c7 7a 63 da 81 c6 05 88
Mar 23 03:44:27 Unraid kernel: Code: c3 b8 00 fe ff ff f0 0f c1 07 c3 31 c0 48 81 ff 58 56 6f 81 72 0c 31 c0 48 81 ff 00 58 6f 81 0f 92 c0 c3 31 c0 ba 01 00 00 00 <f0> 0f b1 17 74 04 89 c6 eb bb c3 8b 07 45 31 c0 85 c0 75 11 ba 01
Mar 23 03:44:27 Unraid kernel: Code: c3 b8 00 fe ff ff f0 0f c1 07 c3 31 c0 48 81 ff 58 56 6f 81 72 0c 31 c0 48 81 ff 00 58 6f 81 0f 92 c0 c3 31 c0 ba 01 00 00 00 <f0> 0f b1 17 74 04 89 c6 eb bb c3 8b 07 45 31 c0 85 c0 75 11 ba 01
Mar 23 03:44:27 Unraid kernel: DMAR: DRHD: handling fault status reg 3
Mar 23 03:44:27 Unraid kernel: DMAR: DRHD: handling fault status reg 3
Mar 23 03:44:27 Unraid kernel: DMAR: DRHD: handling fault status reg 3
Mar 23 03:44:27 Unraid kernel: DMAR: DRHD: handling fault status reg 3
Mar 23 03:44:27 Unraid kernel: DMAR: [DMA Read] Request device [0b:00.0] PASID ffffffff fault addr d6766000 [fault reason 06] PTE Read access is not set
Mar 23 03:44:27 Unraid kernel: DMAR: [DMA Read] Request device [0b:00.0] PASID ffffffff fault addr f973c000 [fault reason 06] PTE Read access is not set
Mar 23 03:44:27 Unraid kernel: DMAR: [DMA Read] Request device [0b:00.0] PASID ffffffff fault addr ff10c000 [fault reason 06] PTE Read access is not set
Mar 23 03:44:27 Unraid kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 23 03:44:27 Unraid kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 23 03:44:27 Unraid kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 23 03:44:27 Unraid kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 23 03:44:27 Unraid kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 23 03:44:27 Unraid kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 23 03:44:27 Unraid kernel: FS: 0000000000000000(0000) GS:ffff888bff240000(0000) knlGS:0000000000000000
Mar 23 03:44:27 Unraid kernel: FS: 0000000000000000(0000) GS:ffff888bff240000(0000) knlGS:0000000000000000
Mar 23 03:44:27 Unraid kernel: FS: 0000000000000000(0000) GS:ffff888bff240000(0000) knlGS:0000000000000000
Mar 23 03:44:27 Unraid kernel: Hardware name: Gigabyte Technology Co., Ltd. B365M DS3H/B365M DS3H, BIOS F5 08/13/2019
Mar 23 03:44:27 Unraid kernel: Hardware name: Gigabyte Technology Co., Ltd. B365M DS3H/B365M DS3H, BIOS F5 08/13/2019
Mar 23 03:44:27 Unraid kernel: Modules linked in: xt_CHECKSUM ipt_REJECT macvlan ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_nat veth xt_MASQUERADE iptable_nat nf_nat xfs nfsd lockd grace sunrpc md_mod ip6table_filter ip6_tables iptable_filter ip_tables bonding i915 wmi_bmof iosf_mbi x86_pkg_temp_thermal i2c_algo_bit intel_powerclamp coretemp drm_kms_helper kvm_intel drm kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel intel_gtt crypto_simd cryptd agpgart mpt3sas i2c_i801 syscopyarea sysfillrect glue_helper sysimgblt r8169 rapl raid_class i2c_smbus fb_sys_fops scsi_transport_sas nvme i2c_core ahci intel_cstate nvme_core realtek wmi intel_uncore libahci video backlight thermal acpi_pad button fan
Mar 23 03:44:27 Unraid kernel: Modules linked in: xt_CHECKSUM ipt_REJECT macvlan ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_nat veth xt_MASQUERADE iptable_nat nf_nat xfs nfsd lockd grace sunrpc md_mod ip6table_filter ip6_tables iptable_filter ip_tables bonding i915 wmi_bmof iosf_mbi x86_pkg_temp_thermal i2c_algo_bit intel_powerclamp coretemp drm_kms_helper kvm_intel drm kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel intel_gtt crypto_simd cryptd agpgart mpt3sas i2c_i801 syscopyarea sysfillrect glue_helper sysimgblt r8169 rapl raid_class i2c_smbus fb_sys_fops scsi_transport_sas nvme i2c_core ahci intel_cstate nvme_core realtek wmi intel_uncore libahci video backlight thermal acpi_pad button fan
Mar 23 03:44:27 Unraid kernel: Oops: 0002 [#1] SMP PTI
Mar 23 03:44:27 Unraid kernel: PGD 0 P4D 0
Mar 23 03:44:27 Unraid kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: 0000000000000000
Mar 23 03:44:27 Unraid kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: 0000000000000000
Mar 23 03:44:27 Unraid kernel: R10: ffffc9000014cbd0 R11: ffffc9000014cbc8 R12: 0000000000000000
Mar 23 03:44:27 Unraid kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Mar 23 03:44:27 Unraid kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Mar 23 03:44:27 Unraid kernel: R13: 0000000000000000 R14: 0000000000000129 R15: 000000000000008a
Mar 23 03:44:27 Unraid kernel: RAX: 0000000000000000 RBX: ffff8881042607c0 RCX: 0000000000000027
Mar 23 03:44:27 Unraid kernel: RAX: 0000000000000000 RBX: ffff888104cdbe80 RCX: ffff888104cdbec8
Mar 23 03:44:27 Unraid kernel: RAX: 0000000000000000 RBX: ffff888104cdbe80 RCX: ffff888104cdbec8
Mar 23 03:44:27 Unraid kernel: RBP: 0000000000000000 R08: ffff888104cdbe80 R09: 00646b636f6c626b
Mar 23 03:44:27 Unraid kernel: RBP: 0000000000000000 R08: ffff888104cdbe80 R09: 00646b636f6c626b
Mar 23 03:44:27 Unraid kernel: RBP: ffff888104cdbe80 R08: 0000000000000000 R09: 00000000ffffefff
Mar 23 03:44:27 Unraid kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
Mar 23 03:44:27 Unraid kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
Mar 23 03:44:27 Unraid kernel: RDX: 00000000ffffefff RSI: 0000000000000001 RDI: ffff888bff258920
Mar 23 03:44:27 Unraid kernel: RIP: 0010:do_raw_spin_lock+0x7/0x12
Mar 23 03:44:27 Unraid kernel: RIP: 0010:do_raw_spin_lock+0x7/0x12
Mar 23 03:44:27 Unraid kernel: RIP: 0010:refcount_warn_saturate+0xa7/0xe8
Mar 23 03:44:27 Unraid kernel: RSP: 0018:ffffc9000014cda0 EFLAGS: 00010086
Mar 23 03:44:27 Unraid kernel: RSP: 0018:ffffc900003dbe38 EFLAGS: 00010246
Mar 23 03:44:27 Unraid kernel: RSP: 0018:ffffc900003dbe38 EFLAGS: 00010246
Mar 23 03:44:27 Unraid kernel: WARNING: CPU: 1 PID: 1510 at lib/refcount.c:28 refcount_warn_saturate+0xa7/0xe8
Mar 23 03:44:27 Unraid kernel: Workqueue: kblockd blk_mq_requeue_work
Mar 23 03:44:27 Unraid kernel: Workqueue: kblockd blk_mq_requeue_work
Mar 23 03:44:27 Unraid kernel: __handle_irq_event_percpu+0x36/0xcb
Mar 23 03:44:27 Unraid kernel: __refcount_sub_and_test.constprop.0+0x24/0x2a
Mar 23 03:44:27 Unraid kernel: blk_mq_free_request+0xc6/0xdf
Mar 23 03:44:27 Unraid kernel: blk_mq_request_bypass_insert+0x1b/0x72
Mar 23 03:44:27 Unraid kernel: blk_mq_requeue_work+0x8f/0xff
Mar 23 03:44:27 Unraid kernel: handle_edge_irq+0xb0/0xd0
Mar 23 03:44:27 Unraid kernel: handle_irq_event+0x34/0x51
Mar 23 03:44:27 Unraid kernel: handle_irq_event_percpu+0x2c/0x6f
Mar 23 03:44:27 Unraid kernel: kthread+0xe5/0xea
Mar 23 03:44:27 Unraid kernel: nvme_irq+0xb/0x17 [nvme]
Mar 23 03:44:27 Unraid kernel: nvme_process_cq+0xdb/0x15b [nvme]
Mar 23 03:44:27 Unraid kernel: process_one_work+0x13c/0x1d5
Mar 23 03:44:27 Unraid kernel: refcount_t: underflow; use-after-free.
Mar 23 03:44:27 Unraid kernel: ret_from_fork+0x22/0x30
Mar 23 03:44:27 Unraid kernel: worker_thread+0x18b/0x22f

 

unraid-diagnostics-20210323-0818.zip

Link to comment

Thank you @JorgeB for your time and comments. I updated the motherboard BIOS to the latest available (what I believe is an experimental version; from F5 to F6e).  There were no notes about SSD/NVMe changes but we'll see.

 

What is concerning to me is that this only started after 6.9 upgrade; it worked fine for months until the upgrade.

 

I'm considering pulling the drive to and putting into my Windows PC to update the firmware (SK Hynix P31 Gold NVMe).  If I get another crash I will do so. 

 

For posterity, I ran a pass and a half (5 hours) of memtest86 with no errors.  Per the updated GPU Driver instructions, I also commented out the go file edits made in 6.8.3.  I had already "enabled" the new iGPU intel drivers via touch command immediately after the upgrade to 6.9, but the guide now says remove the go file edits.

Edited by lostinspace
Link to comment

Intel iGPU seems to be the issue in 6.9.0/1, I rolled back for this exact reason as media is the main reason I have the server.

Not sure if there is a definitive answer for it being incorrect config, bad plex container version, or something funky with the kernel and intel igpus. I'm holding off on upgrading again till they get this sorted but the intel igpu transcoders seem to be a small section of the community.

Link to comment
24 minutes ago, Tristankin said:

Intel iGPU seems to be the issue in 6.9.0/1,

It's not a universal issue as I (and many others) are currently running unRAID 6.9.1 and using the new way of loading i915 drivers documented in the release notes rather than loading them from the 'go' file as I did in previous versions.

 

With 6.9.1, i915/Intel iGPU has worked for me with no issues in Plex and HandBrake with both the 'go' file and touch /boot/config/modprobe.d/i915.conf methods.

 

I have a 9th-generation Intel CPU (Xeon E-2288G) with the UHD P630 iGPU.

Link to comment

Chiming in that I'm running a i7-7700 also with UHD630 iGPU using it with Emby with iGPU transcoding as well and my server's rock stable (Unraid bugs not withstanding) - I've only rebooted it to enable VFIO binding and recover from a bad package install (newer Slackware packages don't work as they updated glibc but Limetech didn't)

 

Link to comment

And another one.  So updating BIOS didn't revolve, taking GPU edits out of go file didn't resolve.  Unless someone has some other suggestions, I'll have to go back to to 6.8.3.

 

Mar 24 08:26:42 Unraid emhttpd: read SMART /dev/sdh
Mar 24 08:27:09 Unraid kernel: #PF: error_code(0x0002) - not-present page
Mar 24 08:27:09 Unraid kernel: #PF: supervisor write access in kernel mode
Mar 24 08:27:09 Unraid kernel: ---[ end trace c84b3c57793f4667 ]---
Mar 24 08:27:09 Unraid kernel: ? __kthread_bind_mask+0x57/0x57
Mar 24 08:27:09 Unraid kernel: ? process_scheduled_works+0x27/0x27
Mar 24 08:27:09 Unraid kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Mar 24 08:27:09 Unraid kernel: CPU: 8 PID: 903 Comm: kworker/8:1H Not tainted 5.10.21-Unraid #1
Mar 24 08:27:09 Unraid kernel: CR2: 0000000000000000
Mar 24 08:27:09 Unraid kernel: CR2: 0000000000000000 CR3: 000000000200c002 CR4: 00000000003726e0
Mar 24 08:27:09 Unraid kernel: CR2: 0000000000000000 CR3: 000000000200c002 CR4: 00000000003726e0
Mar 24 08:27:09 Unraid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 24 08:27:09 Unraid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 24 08:27:09 Unraid kernel: Call Trace:
Mar 24 08:27:09 Unraid kernel: Code: c3 b8 00 fe ff ff f0 0f c1 07 c3 31 c0 48 81 ff 58 56 6f 81 72 0c 31 c0 48 81 ff 00 58 6f 81 0f 92 c0 c3 31 c0 ba 01 00 00 00 <f0> 0f b1 17 74 04 89 c6 eb bb c3 8b 07 45 31 c0 85 c0 75 11 ba 01
Mar 24 08:27:09 Unraid kernel: Code: c3 b8 00 fe ff ff f0 0f c1 07 c3 31 c0 48 81 ff 58 56 6f 81 72 0c 31 c0 48 81 ff 00 58 6f 81 0f 92 c0 c3 31 c0 ba 01 00 00 00 <f0> 0f b1 17 74 04 89 c6 eb bb c3 8b 07 45 31 c0 85 c0 75 11 ba 01
Mar 24 08:27:09 Unraid kernel: DMAR: DRHD: handling fault status reg 3
Mar 24 08:27:09 Unraid kernel: DMAR: DRHD: handling fault status reg 3
Mar 24 08:27:09 Unraid kernel: DMAR: DRHD: handling fault status reg 3
Mar 24 08:27:09 Unraid kernel: DMAR: DRHD: handling fault status reg 3
Mar 24 08:27:09 Unraid kernel: DMAR: [DMA Read] Request device [0b:00.0] PASID ffffffff fault addr d6c30000 [fault reason 06] PTE Read access is not set
Mar 24 08:27:09 Unraid kernel: DMAR: [DMA Read] Request device [0b:00.0] PASID ffffffff fault addr db7b2000 [fault reason 06] PTE Read access is not set
Mar 24 08:27:09 Unraid kernel: DMAR: [DMA Read] Request device [0b:00.0] PASID ffffffff fault addr db7b2000 [fault reason 06] PTE Read access is not set
Mar 24 08:27:09 Unraid kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 24 08:27:09 Unraid kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 24 08:27:09 Unraid kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 24 08:27:09 Unraid kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 24 08:27:09 Unraid kernel: FS: 0000000000000000(0000) GS:ffff888bff400000(0000) knlGS:0000000000000000
Mar 24 08:27:09 Unraid kernel: FS: 0000000000000000(0000) GS:ffff888bff400000(0000) knlGS:0000000000000000
Mar 24 08:27:09 Unraid kernel: Hardware name: Gigabyte Technology Co., Ltd. B365M DS3H/B365M DS3H, BIOS F6e 08/18/2020
Mar 24 08:27:09 Unraid kernel: Modules linked in: xt_CHECKSUM macvlan ipt_REJECT ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_nat veth xt_MASQUERADE iptable_nat nf_nat xfs nfsd lockd grace sunrpc md_mod ip6table_filter ip6_tables iptable_filter ip_tables bonding wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp i915 kvm_intel kvm iosf_mbi crct10dif_pclmul i2c_algo_bit crc32_pclmul crc32c_intel ghash_clmulni_intel drm_kms_helper aesni_intel crypto_simd cryptd glue_helper drm mpt3sas intel_gtt agpgart i2c_i801 syscopyarea rapl i2c_smbus sysfillrect i2c_core sysimgblt r8169 nvme intel_cstate raid_class input_leds led_class scsi_transport_sas fb_sys_fops nvme_core intel_uncore ahci realtek wmi libahci video backlight thermal acpi_pad button fan
Mar 24 08:27:09 Unraid kernel: Oops: 0002 [#1] SMP PTI
Mar 24 08:27:09 Unraid kernel: PGD 0 P4D 0
Mar 24 08:27:09 Unraid kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: 0000000000000000
Mar 24 08:27:09 Unraid kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: 0000000000000000
Mar 24 08:27:09 Unraid kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Mar 24 08:27:09 Unraid kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Mar 24 08:27:09 Unraid kernel: RAX: 0000000000000000 RBX: ffff888104aba500 RCX: ffff888104aba548
Mar 24 08:27:09 Unraid kernel: RAX: 0000000000000000 RBX: ffff888104aba500 RCX: ffff888104aba548
Mar 24 08:27:09 Unraid kernel: RBP: 0000000000000000 R08: ffff888104aba500 R09: 00646b636f6c626b
Mar 24 08:27:09 Unraid kernel: RBP: 0000000000000000 R08: ffff888104aba500 R09: 00646b636f6c626b
Mar 24 08:27:09 Unraid kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
Mar 24 08:27:09 Unraid kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
Mar 24 08:27:09 Unraid kernel: RIP: 0010:do_raw_spin_lock+0x7/0x12
Mar 24 08:27:09 Unraid kernel: RIP: 0010:do_raw_spin_lock+0x7/0x12
Mar 24 08:27:09 Unraid kernel: RSP: 0018:ffffc90001863e38 EFLAGS: 00010246
Mar 24 08:27:09 Unraid kernel: RSP: 0018:ffffc90001863e38 EFLAGS: 00010246
Mar 24 08:27:09 Unraid kernel: Workqueue: kblockd blk_mq_requeue_work
Mar 24 08:27:09 Unraid kernel: blk_mq_request_bypass_insert+0x1b/0x72
Mar 24 08:27:09 Unraid kernel: blk_mq_requeue_work+0x8f/0xff
Mar 24 08:27:09 Unraid kernel: kthread+0xe5/0xea
Mar 24 08:27:09 Unraid kernel: process_one_work+0x13c/0x1d5
Mar 24 08:27:09 Unraid kernel: ret_from_fork+0x22/0x30
Mar 24 08:27:09 Unraid kernel: worker_thread+0x18b/0x22f

 

Edited by lostinspace
Link to comment
  • 2 weeks later...

Just to add a data point here, I couldn't figure out the problem and was tired of the server freezing/crashing and becoming completely unavailable/unresponsive, so I reverted back to 6.8.3.  I've got 14 straight days of uptime - unheard of when I went to 6.9/6.9.1.  So there's something with 6.9/6.9.1 that just doesn't like my hardware or something.

Link to comment
  • 2 weeks later...

Not sure if this is relevant but I just recently may have fixed a huge instability issue I had.

 

I'm on an X8DTH-6F board, Xeon processors (which ones don't matter, trust me -- I tried four different pair, spread across the supported set) and had intermittent crashing.

 

I fixed it by using sysfs to disable cstates on my CPUs. I can't promise this will help anyone, but I had VERY similar crash messages, pointing to a wide variety of random hardware. Log in via terminal, drop this command, and cross your fingers.

 

 

for cpus in /sys/devices/system/cpu/cpu*/cpuidle/state*/disable; do echo 1 > $cpus; done

 

Hope this helps.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.