AC3 Posted September 16, 2024 Posted September 16, 2024 Disclaimer: complete noob with Unraid (but semi-comfy with Linux), currently on eval license. new NAS build using one of those integrated N100 boards from AliExpress, 16GB DDR5 4800 non-ECC, 2 x HGST 12TB enterprise drives using XFS (1 data, 1 parity), and a 128GB Samsung M.2 cache drive for app data. Usage is mainly for Plex server and backups from my other devices. I am trying to migrate my data from a QNAP TS-451 RAID5 with around 4TB to move. My QNAP is having backplane issues and random drive disconnects, and being an older cpu, figured I would update to something that can better handle some transcoding. Played with Scale for a few days, but think Unraid is more appropriate for my needs. Not sure if my issue belongs here, or in file sharing, but here goes: Unraid installed without a hitch, managed to map nfs shares to my qnap (Unassigned Devices plugin). I disabled the Unraid parity sync drive for xfer speed sake and started to copy media to the data1 drive on Unraid using MC. Everything seemed to start off good, transfer sustained around 90MB/sec. But after around 350GB of transfer to the Unraid drive, everything comes to a halt. No activity in MC, no active transfer traffic in network, cpu has revolving 100% usage spikes on 1 core at a time, and the Unraid GUI seems to become unresponsive - I can see live info in the dashboard (cpu/network stats), but unable to stop MC or shutdown via GUI - have to do a dirty shutdown. I wasn't sure if this was a snafu with NFS and MC, so I tried a 2nd time mapping SMB instead of NFS to the Qnap, also tried the Krusader plugin, same issue....it works for a couple hundred GB, then just stops. Logs showing something throwing a tainted cpu error when the problem occurs, but not sure what the pid points to, as Unraid becomes unresponsive and can't access terminal, forcing me to do dirty shutdown. I'm currently running memtest a 2nd time, but so far after 3 passes, no errors. Temps seem fine too, running alfresco outside of the case. Here is the portion of the log when things go haywire, if anybody has troubleshooting suggestions, I'd be very grateful. Thx. Sep 16 06:58:40 Tower kernel: general protection fault, maybe for address 0xffff8881547f2600: 0000 [#1] PREEMPT SMP NOPTI Sep 16 06:58:40 Tower kernel: CPU: 0 PID: 25662 Comm: unraidd1 Tainted: P O 6.1.106-Unraid #1 Sep 16 06:58:40 Tower kernel: Hardware name: Default string Default string/Default string, BIOS 5.27 11/14/2023 Sep 16 06:58:40 Tower kernel: RIP: 0010:copy_data+0x17f/0x219 [md_mod] Sep 16 06:58:40 Tower kernel: Code: 48 c1 e5 27 48 c1 e7 0c 48 01 fd 48 01 cd 8b 4c 24 30 83 7c 24 14 00 48 89 ca 74 08 4c 89 c7 48 89 ee eb 06 48 89 ef 4c 89 c6 <f3> a4 48 89 44 24 28 89 54 24 08 65 48 8b 0c 25 80 cb 01 00 ff 89 Sep 16 06:58:40 Tower kernel: RSP: 0018:ffffc9000787fd60 EFLAGS: 00010202 Sep 16 06:58:40 Tower kernel: RAX: ffff8881547f2600 RBX: 0000000000000000 RCX: 0000000000001000 Sep 16 06:58:40 Tower kernel: RDX: 0000000000001000 RSI: ffff888137406000 RDI: bfff8881411ae000 Sep 16 06:58:40 Tower kernel: RBP: ffff888137406000 R08: bfff8881411ae000 R09: ffff88814a8387e8 Sep 16 06:58:40 Tower kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001000 Sep 16 06:58:40 Tower kernel: R13: 0000000000001000 R14: 0000000000000000 R15: 0000000000006000 Sep 16 06:58:40 Tower kernel: FS: 0000000000000000(0000) GS:ffff88846fa00000(0000) knlGS:0000000000000000 Sep 16 06:58:40 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 16 06:58:40 Tower kernel: CR2: 0000149a23635000 CR3: 000000000420a003 CR4: 0000000000770ef0 Sep 16 06:58:40 Tower kernel: PKRU: 55555554 Sep 16 06:58:40 Tower kernel: Call Trace: Sep 16 06:58:40 Tower kernel: <TASK> Sep 16 06:58:40 Tower kernel: ? __die_body+0x1a/0x5c Sep 16 06:58:40 Tower kernel: ? die_addr+0x38/0x51 Sep 16 06:58:40 Tower kernel: ? exc_general_protection+0x30f/0x345 Sep 16 06:58:40 Tower kernel: ? asm_exc_general_protection+0x22/0x30 Sep 16 06:58:40 Tower kernel: ? copy_data+0x17f/0x219 [md_mod] Sep 16 06:58:40 Tower kernel: copy_write_data+0x48/0x8f [md_mod] Sep 16 06:58:40 Tower kernel: unraidd+0xca5/0x1140 [md_mod] Sep 16 06:58:40 Tower kernel: md_thread+0xf4/0x122 [md_mod] Sep 16 06:58:40 Tower kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20 Sep 16 06:58:40 Tower kernel: ? signal_pending+0x1d/0x1d [md_mod] Sep 16 06:58:40 Tower kernel: kthread+0xe4/0xef Sep 16 06:58:40 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b Sep 16 06:58:40 Tower kernel: ret_from_fork+0x1f/0x30 Sep 16 06:58:40 Tower kernel: </TASK> Sep 16 06:58:40 Tower kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs xt_nat veth vhost_net tun vhost tap kvm_intel kvm md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls igc i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp iosf_mbi drm_buddy i2c_algo_bit ttm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_display_helper sha256_ssse3 sha1_ssse3 aesni_intel drm_kms_helper crypto_simd cryptd mei_hdcp mei_pxp wmi_bmof rapl intel_cstate intel_uncore drm Sep 16 06:58:40 Tower kernel: i2c_i801 intel_gtt i2c_smbus agpgart mei_me ahci i2c_core input_leds joydev mei libahci led_class thermal fan syscopyarea sysfillrect sysimgblt fb_sys_fops tpm_crb video tpm_tis tpm_tis_core wmi backlight tpm intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: i2c_dev] Sep 16 06:58:40 Tower kernel: ---[ end trace 0000000000000000 ]--- Sep 16 06:58:40 Tower kernel: RIP: 0010:copy_data+0x17f/0x219 [md_mod] Sep 16 06:58:40 Tower kernel: Code: 48 c1 e5 27 48 c1 e7 0c 48 01 fd 48 01 cd 8b 4c 24 30 83 7c 24 14 00 48 89 ca 74 08 4c 89 c7 48 89 ee eb 06 48 89 ef 4c 89 c6 <f3> a4 48 89 44 24 28 89 54 24 08 65 48 8b 0c 25 80 cb 01 00 ff 89 Sep 16 06:58:40 Tower kernel: RSP: 0018:ffffc9000787fd60 EFLAGS: 00010202 Sep 16 06:58:40 Tower kernel: RAX: ffff8881547f2600 RBX: 0000000000000000 RCX: 0000000000001000 Sep 16 06:58:40 Tower kernel: RDX: 0000000000001000 RSI: ffff888137406000 RDI: bfff8881411ae000 Sep 16 06:58:40 Tower kernel: RBP: ffff888137406000 R08: bfff8881411ae000 R09: ffff88814a8387e8 Sep 16 06:58:40 Tower kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001000 Sep 16 06:58:40 Tower kernel: R13: 0000000000001000 R14: 0000000000000000 R15: 0000000000006000 Sep 16 06:58:40 Tower kernel: FS: 0000000000000000(0000) GS:ffff88846fa00000(0000) knlGS:0000000000000000 Sep 16 06:58:40 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 16 06:58:40 Tower kernel: CR2: 0000149a23635000 CR3: 00000001493a4001 CR4: 0000000000770ef0 Sep 16 06:58:40 Tower kernel: PKRU: 55555554 Sep 16 06:58:40 Tower kernel: note: unraidd1[25662] exited with preempt_count 1 Sep 16 06:58:40 Tower kernel: ------------[ cut here ]------------ Sep 16 06:58:40 Tower kernel: WARNING: CPU: 0 PID: 25662 at kernel/exit.c:816 do_exit+0x87/0x923 Sep 16 06:58:40 Tower kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs xt_nat veth vhost_net tun vhost tap kvm_intel kvm md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls igc i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp iosf_mbi drm_buddy i2c_algo_bit ttm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_display_helper sha256_ssse3 sha1_ssse3 aesni_intel drm_kms_helper crypto_simd cryptd mei_hdcp mei_pxp wmi_bmof rapl intel_cstate intel_uncore drm Sep 16 06:58:40 Tower kernel: i2c_i801 intel_gtt i2c_smbus agpgart mei_me ahci i2c_core input_leds joydev mei libahci led_class thermal fan syscopyarea sysfillrect sysimgblt fb_sys_fops tpm_crb video tpm_tis tpm_tis_core wmi backlight tpm intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: i2c_dev] Sep 16 06:58:40 Tower kernel: CPU: 0 PID: 25662 Comm: unraidd1 Tainted: P D O 6.1.106-Unraid #1 Sep 16 06:58:40 Tower kernel: Hardware name: Default string Default string/Default string, BIOS 5.27 11/14/2023 Sep 16 06:58:40 Tower kernel: RIP: 0010:do_exit+0x87/0x923 Sep 16 06:58:40 Tower kernel: Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 41 30 81 00 48 83 bb b0 07 00 00 00 74 02 <0f> 0b 48 8b bb d8 06 00 00 e8 43 2f 81 00 48 8b 83 d0 06 00 00 83 Sep 16 06:58:40 Tower kernel: RSP: 0018:ffffc9000787fee0 EFLAGS: 00010286 Sep 16 06:58:40 Tower kernel: RAX: 0000000000000000 RBX: ffff88814362f000 RCX: 0000000000000000 Sep 16 06:58:40 Tower kernel: RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff Sep 16 06:58:40 Tower kernel: RBP: 000000000000000b R08: 0000000000000000 R09: 0000000000aaaaaa Sep 16 06:58:40 Tower kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffff88810158c000 Sep 16 06:58:40 Tower kernel: R13: ffff88814b14e300 R14: 0000000000000000 R15: 0000000000000000 Sep 16 06:58:40 Tower kernel: FS: 0000000000000000(0000) GS:ffff88846fa00000(0000) knlGS:0000000000000000 Sep 16 06:58:40 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 16 06:58:40 Tower kernel: CR2: 0000149a23635000 CR3: 00000001493a4001 CR4: 0000000000770ef0 Sep 16 06:58:40 Tower kernel: PKRU: 55555554 Sep 16 06:58:40 Tower kernel: Call Trace: Sep 16 06:58:40 Tower kernel: <TASK> Sep 16 06:58:40 Tower kernel: ? __warn+0xab/0x122 Sep 16 06:58:40 Tower kernel: ? report_bug+0x109/0x17e Sep 16 06:58:40 Tower kernel: ? do_exit+0x87/0x923 Sep 16 06:58:40 Tower kernel: ? handle_bug+0x41/0x6f Sep 16 06:58:40 Tower kernel: ? exc_invalid_op+0x13/0x60 Sep 16 06:58:40 Tower kernel: ? asm_exc_invalid_op+0x16/0x20 Sep 16 06:58:40 Tower kernel: ? do_exit+0x87/0x923 Sep 16 06:58:40 Tower kernel: make_task_dead+0x11c/0x11c Sep 16 06:58:40 Tower kernel: rewind_stack_and_make_dead+0x17/0x17 Sep 16 06:58:40 Tower kernel: RIP: 0000:0x0 Sep 16 06:58:40 Tower kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6. Sep 16 06:58:40 Tower kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 Sep 16 06:58:40 Tower kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Sep 16 06:58:40 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Sep 16 06:58:40 Tower kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Sep 16 06:58:40 Tower kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Sep 16 06:58:40 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Sep 16 06:58:40 Tower kernel: </TASK> Sep 16 06:58:40 Tower kernel: ---[ end trace 0000000000000000 ]--- Sep 16 06:59:02 Tower rpc.mountd[16679]: refused mount request from 192.168.50.50 for /mnt/user/download (/): not exported Quote
JorgeB Posted September 17, 2024 Posted September 17, 2024 Unraid driver is crashing, this is usually a hardware issue, but can also be a kernel compatibility problem, try updating to v7.0.0-beta since it uses a very different kernel. Quote
AC3 Posted September 17, 2024 Author Posted September 17, 2024 3 hours ago, JorgeB said: Unraid driver is crashing, this is usually a hardware issue, but can also be a kernel compatibility problem, try updating to v7.0.0-beta since it uses a very different kernel. Thank-you for pointing me in the right direction. Turns out memtest flagged an error on the 4th pass ([Data Error] Test: 10, CPU: 0, Address: 1D6D6FBAC, Expected: 00000000, Actual: 01000000); so I'll have to replace the ram and try again. Will report back. 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.