bdarnell Posted April 4, 2023 Share Posted April 4, 2023 GOAL, move unraid usb, disk drives, cache drive to a completely new setup. Then upgrade all the disk drives to 20tb shucked WD drives. Sell the 10tb drives online to recoup some costs. Moving to the new setup went fine. No issues, parity good on the old setup and on the new setup, and all things started up as before. I have already precleared all the 14x 20tb drives prior to bringing them into this setup, again, no issues all passed. Upgrading all the disk drives is where I'm getting issues. I have 14x 10tb drives in the primary DAS ... 4x SFF-8087 to 4xSATA Forward Breakout 1.6ft to a LSI IBM 03X3834 16 Port PCI-e SAS Expander to 2xMini SAS SFF-8087 to SFF-8087 Cable, 100-Ohms, 1.6ft to Dual Ports Mini SAS SFF-8088 to SAS 36Pin SFF-8087 PCBA Female Adapter with PCI Bracket to 2xMini-SAS SFF-8088 to SFF-8088 Molex 2 Meter Cable to a LSI SAS9200-16e 16-Port External HBA Full-Height PCIe P20 IT Mode Connected to the first PCIe slot on my motherboard I have 14x 20tb drives in the secondary DAS ... 4x SFF-8087 to 4xSATA Forward Breakout 1.6ft to a LSI IBM 03X3834 16 Port PCI-e SAS Expander to 2xMini SAS SFF-8087 to SFF-8087 Cable, 100-Ohms, 1.6ft to Dual Ports Mini SAS SFF-8088 to SAS 36Pin SFF-8087 PCBA Female Adapter with PCI Bracket to 2xMini-SAS SFF-8088 to SFF-8088 Molex 2 Meter Cable to a LSI SAS9200-16e 16-Port External HBA Full-Height PCIe P20 IT Mode Connected to the second PCIe slot on my motherboard Procedure to swap a drive... Prevent any unnecessary disk usages programs from running... Mover, Parity check, unmaniac, binhex-backup Stop the array Swap out on of the 10tb drives for a 20tb drive Start the array Rebuild starts automatically Once that drive completes, repeat until all the drives are up Issues encountered All the drives in the current array are spun up, and the transfer speeds are initially low (5-20MB/s per disk). Then later they spin up to 205MB/s per disk. Then I get variations of disk speeds down to 5-20MB/s range, and sometimes the CPU get pegged at 100% for a few min, then returns to normal usage >10%, and the disk speeds increase again. After some amount of time I come back to check on the progress, and the array has stopped, multiple drives are showing errors on the Main screen, multiple drives are showing elevated 199 UDMA CRC error counts, uptime is less than the time from when I started the swap drive process, and of course many texts from my friends and family that they cannot access plex. What I've tried already Reseated all cables Replaced all cables with new ones Verified FW version are correct for the HBAs I've attached diagnostics, as I cannot figure out what's going on. piggy-diagnostics-20230316-1958.zip Quote Link to comment
JorgeB Posted April 5, 2023 Share Posted April 5, 2023 12 hours ago, bdarnell said: multiple drives are showing errors on the Main screen Do you mean read errors? Not seeing any on the diags posted, but they are shortly after a reboot. Quote Link to comment
bdarnell Posted April 5, 2023 Author Share Posted April 5, 2023 I dont want to start the data rebuild until I have something I can change and test, because it does not complete and the array stops somewhere between 1hr and 20hrs into the rebuild. But the Dashboard section will show errors during a rebuild, and the Main section will show errors on multiple drives...after a reboot those numbers are reset, and I don't see errors in the logs for those disks. (I attached images of these sections, they show 0 because its after a restart and I paused the rebuild immediately upon reboot) What do you suggest I can try before I try another rebuild? Quote Link to comment
JorgeB Posted April 5, 2023 Share Posted April 5, 2023 Without seeing the actual errors difficult to say, but most likely power/connection issues. Quote Link to comment
bdarnell Posted April 7, 2023 Author Share Posted April 7, 2023 The array just stopped again, not sure why. I attached diagnostics to see if you can help me. piggy-diagnostics-20230406-2201.zip Quote Link to comment
JorgeB Posted April 7, 2023 Share Posted April 7, 2023 Looks more like a power/connection problem. Quote Link to comment
bdarnell Posted April 18, 2023 Author Share Posted April 18, 2023 I've reduced from 15x10tb and 15x20tb drives to only 15 total drives, and physically swapping 1 drive at a time. This has been working and when complete I will be putting everything into one case. This will reduce the number of potential failure points by removing extra cards, wires, power supplies. Quote Link to comment
bdarnell Posted April 28, 2023 Author Share Posted April 28, 2023 Random docker soft crashing during data-rebuild, unraid gui works fine, cpus get pegged when the docker applications soft crash.... It really feels like something else is causing this to happen, but what else can I test? I've reduced the number of wires by moving the case close to the drives. Now from the motherboard, I have 2x LSI 9217-8I connected directly to the harddrives with 4x SFF-8087 to 4xSATA Forward Breakout 1.6ft I have 9 drives swapped, working on number 10. But this one keeps stopping around 30-35% locking up the docker containers, preventing me from stopping or restarting docker applications. piggy-diagnostics-20230427-2326.zip Quote Link to comment
bdarnell Posted April 30, 2023 Author Share Posted April 30, 2023 I got this error message today....what does it mean" Apr 30 02:12:49 piggy kernel: python3[24762]: segfault at 14bf167330 ip 000014bf1642763c sp 00007ffd3b138580 error 4 in libpython3.11.so.1.0[14bf162ee000+223000] Apr 30 02:12:49 piggy kernel: Code: a8 00 00 00 01 e9 2b d5 ee ff 0f 1f 40 00 41 54 55 53 48 89 fb 48 83 ec 10 48 8b 57 f0 48 85 d2 0f 84 08 02 00 00 48 8b 7f f8 <4c> 8b 42 08 4c 8d 15 d9 ff ff ff 4c 8b 4b 08 48 83 e7 fc 41 83 e0 Quote Link to comment
itimpi Posted April 30, 2023 Share Posted April 30, 2023 Other than the fact it is python3 crashing I suspect you are unlikely to get much help from Unraid users as python is never part of a standard install (as far as I know). Quote Link to comment
bdarnell Posted May 5, 2023 Author Share Posted May 5, 2023 I'm still getting errors during Parity sync which is locking up my docker containers, bringing down plex I've tried... updating my BIOS to the latest version updating to the latest rc version of Unraid. What does this error mean? May 5 00:56:42 piggy kernel: md: recovery thread: multiple disk errors, sector=992025512 May 5 00:56:42 piggy kernel: md: recovery thread: multiple disk errors, sector=992025512 May 5 00:57:58 piggy kernel: md: recovery thread: multiple disk errors, sector=1019867216 May 5 00:57:58 piggy kernel: md: recovery thread: multiple disk errors, sector=1019867216 May 5 00:58:15 piggy kernel: md: recovery thread: multiple disk errors, sector=1025975768 May 5 00:58:15 piggy kernel: ------------[ cut here ]------------ May 5 00:58:15 piggy kernel: kernel BUG at drivers/md/unraid.c:1617! May 5 00:58:15 piggy kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI May 5 00:58:15 piggy kernel: CPU: 16 PID: 6813 Comm: unraidd0 Not tainted 6.1.27-Unraid #1 May 5 00:58:15 piggy kernel: Hardware name: Gigabyte Technology Co., Ltd. Z690 AERO D/Z690 AERO D, BIOS F24a 04/25/2023 May 5 00:58:15 piggy kernel: RIP: 0010:unraidd+0x1051/0x1140 [md_mod] May 5 00:58:15 piggy kernel: Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 c3 3e a0 48 8b 73 20 e8 ce b4 46 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10 May 5 00:58:15 piggy kernel: RSP: 0018:ffffc90000f3fdf0 EFLAGS: 00010246 May 5 00:58:15 piggy kernel: RAX: 0000000000000000 RBX: ffff88813e2dee08 RCX: 0000000000000000 May 5 00:58:15 piggy kernel: RDX: 0000000000000000 RSI: ffffffff829e4f00 RDI: ffff8881012ce038 May 5 00:58:15 piggy kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000 May 5 00:58:15 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810bccb930 May 5 00:58:15 piggy kernel: R13: ffff88813e2df2c0 R14: ffff88813e2df338 R15: ffff88813e1195d8 May 5 00:58:15 piggy kernel: FS: 0000000000000000(0000) GS:ffff88907f800000(0000) knlGS:0000000000000000 May 5 00:58:15 piggy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 5 00:58:15 piggy kernel: CR2: 000014663af54000 CR3: 0000000733336005 CR4: 0000000000770ee0 May 5 00:58:15 piggy kernel: PKRU: 55555554 May 5 00:58:15 piggy kernel: Call Trace: May 5 00:58:15 piggy kernel: <TASK> May 5 00:58:15 piggy kernel: md_thread+0xf4/0x122 [md_mod] May 5 00:58:15 piggy kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20 May 5 00:58:15 piggy kernel: ? signal_pending+0x1d/0x1d [md_mod] May 5 00:58:15 piggy kernel: kthread+0xe4/0xef May 5 00:58:15 piggy kernel: ? kthread_complete_and_exit+0x1b/0x1b May 5 00:58:15 piggy kernel: ret_from_fork+0x1f/0x30 May 5 00:58:15 piggy kernel: </TASK> May 5 00:58:15 piggy kernel: Modules linked in: tun xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igc atlantic i915 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm aesni_intel btusb crypto_simd btrtl cryptd btbcm mei_hdcp mei_pxp btintel i2c_i801 intel_gtt rapl intel_cstate gigabyte_wmi wmi_bmof bluetooth intel_uncore mpt3sas thunderbolt agpgart i2c_smbus nvme mei_me ahci input_leds i2c_core ecdh_generic joydev led_class mei nvme_core libahci raid_class ecc syscopyarea sysfillrect scsi_transport_sas sysimgblt thermal fb_sys_fops fan video tpm_crb tpm_tis tpm_tis_core wmi tpm May 5 00:58:15 piggy kernel: backlight intel_pmc_core acpi_pad acpi_tad button unix [last unloaded: igc] May 5 00:58:15 piggy kernel: ---[ end trace 0000000000000000 ]--- May 5 00:58:15 piggy kernel: RIP: 0010:unraidd+0x1051/0x1140 [md_mod] May 5 00:58:15 piggy kernel: Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 c3 3e a0 48 8b 73 20 e8 ce b4 46 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10 May 5 00:58:15 piggy kernel: RSP: 0018:ffffc90000f3fdf0 EFLAGS: 00010246 May 5 00:58:15 piggy kernel: RAX: 0000000000000000 RBX: ffff88813e2dee08 RCX: 0000000000000000 May 5 00:58:15 piggy kernel: RDX: 0000000000000000 RSI: ffffffff829e4f00 RDI: ffff8881012ce038 May 5 00:58:15 piggy kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000 May 5 00:58:15 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810bccb930 May 5 00:58:15 piggy kernel: R13: ffff88813e2df2c0 R14: ffff88813e2df338 R15: ffff88813e1195d8 May 5 00:58:15 piggy kernel: FS: 0000000000000000(0000) GS:ffff88907f800000(0000) knlGS:0000000000000000 May 5 00:58:15 piggy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 5 00:58:15 piggy kernel: CR2: 000014663af54000 CR3: 0000000733336006 CR4: 0000000000770ee0 May 5 00:58:15 piggy kernel: PKRU: 55555554 May 5 00:58:15 piggy kernel: ------------[ cut here ]------------ May 5 00:58:15 piggy kernel: WARNING: CPU: 16 PID: 6813 at kernel/exit.c:814 do_exit+0x87/0x923 May 5 00:58:15 piggy kernel: Modules linked in: tun xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igc atlantic i915 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm aesni_intel btusb crypto_simd btrtl cryptd btbcm mei_hdcp mei_pxp btintel i2c_i801 intel_gtt rapl intel_cstate gigabyte_wmi wmi_bmof bluetooth intel_uncore mpt3sas thunderbolt agpgart i2c_smbus nvme mei_me ahci input_leds i2c_core ecdh_generic joydev led_class mei nvme_core libahci raid_class ecc syscopyarea sysfillrect scsi_transport_sas sysimgblt thermal fb_sys_fops fan video tpm_crb tpm_tis tpm_tis_core wmi tpm May 5 00:58:15 piggy kernel: backlight intel_pmc_core acpi_pad acpi_tad button unix [last unloaded: igc] May 5 00:58:15 piggy kernel: CPU: 16 PID: 6813 Comm: unraidd0 Tainted: G D 6.1.27-Unraid #1 May 5 00:58:15 piggy kernel: Hardware name: Gigabyte Technology Co., Ltd. Z690 AERO D/Z690 AERO D, BIOS F24a 04/25/2023 May 5 00:58:15 piggy kernel: RIP: 0010:do_exit+0x87/0x923 May 5 00:58:15 piggy kernel: Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 51 40 80 00 48 83 bb 90 07 00 00 00 74 02 <0f> 0b 48 8b bb b8 06 00 00 e8 53 3f 80 00 48 8b 83 b0 06 00 00 83 May 5 00:58:15 piggy kernel: RSP: 0018:ffffc90000f3fee0 EFLAGS: 00010286 May 5 00:58:15 piggy kernel: RAX: 0000000080000000 RBX: ffff88810bd72f40 RCX: 0000000000000000 May 5 00:58:15 piggy kernel: RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff May 5 00:58:15 piggy kernel: RBP: 000000000000000b R08: 0000000000000000 R09: ffffffff8294b3f0 May 5 00:58:15 piggy kernel: R10: 00003fffffffffff R11: ffff8890bfbc3f6e R12: ffff8881043ee000 May 5 00:58:15 piggy kernel: R13: ffff88813f1e3180 R14: 0000000000000002 R15: ffffffff82069847 May 5 00:58:15 piggy kernel: FS: 0000000000000000(0000) GS:ffff88907f800000(0000) knlGS:0000000000000000 May 5 00:58:15 piggy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 5 00:58:15 piggy kernel: CR2: 000014663af54000 CR3: 0000000733336006 CR4: 0000000000770ee0 May 5 00:58:15 piggy kernel: PKRU: 55555554 May 5 00:58:15 piggy kernel: Call Trace: May 5 00:58:15 piggy kernel: <TASK> May 5 00:58:15 piggy kernel: make_task_dead+0x11c/0x11c May 5 00:58:15 piggy kernel: rewind_stack_and_make_dead+0x17/0x17 May 5 00:58:15 piggy kernel: RIP: 0000:0x0 May 5 00:58:15 piggy kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6. May 5 00:58:15 piggy kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 May 5 00:58:15 piggy kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 May 5 00:58:15 piggy kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 May 5 00:58:15 piggy kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 May 5 00:58:15 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 May 5 00:58:15 piggy kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 May 5 00:58:15 piggy kernel: </TASK> May 5 00:58:15 piggy kernel: ---[ end trace 0000000000000000 ]--- Quote Link to comment
JorgeB Posted May 5, 2023 Share Posted May 5, 2023 The Unraid driver is crashing, this usually is a hardware problem, but sometimes it can be a kernel compatibly issue, update to v6.12-rc5 and re-test, if issue persists it's likely hardware. Quote Link to comment
bdarnell Posted May 5, 2023 Author Share Posted May 5, 2023 That error message came from the unRAID version v6.12-rc5 If I don't run parity sync or drive rebuilds, then system is stable. If it's an hardware issue... Are the recommendations for disabling things in BIOS (gigabyte aero d with i9-13900k). All the parts in this build are brand new and cables have been all replaced with brand new. What items should I look at? Should I completely unplug everything from the motherboard/power supply/harddrive and reseat every thing? Should I stress test the CPU and memory modules? Would the nvme drives cause this issue? Does anyone else have issues with this combination of CPU/motherboard? Quote Link to comment
JorgeB Posted May 5, 2023 Share Posted May 5, 2023 2 hours ago, bdarnell said: That error message came from the unRAID version v6.12-rc5 If I don't run parity sync or drive rebuilds, then system is stable. And it was/is the same with v6.11.5? Quote Link to comment
bdarnell Posted May 5, 2023 Author Share Posted May 5, 2023 Yes is the same scenario when using v6.11.5 Here is an error from that version May 4 20:50:24 piggy kernel: md: recovery thread: multiple disk errors, sector=7498539328 May 4 20:50:24 piggy kernel: ------------[ cut here ]------------ May 4 20:50:24 piggy kernel: kernel BUG at drivers/md/unraid.c:1617! May 4 20:50:24 piggy kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI May 4 20:50:24 piggy kernel: CPU: 0 PID: 10725 Comm: unraidd0 Not tainted 5.19.17-Unraid #2 May 4 20:50:24 piggy kernel: Hardware name: Gigabyte Technology Co., Ltd. Z690 AERO D/Z690 AERO D, BIOS F23a 01/04/2023 May 4 20:50:24 piggy kernel: RIP: 0010:unraidd+0x1051/0x1140 [md_mod] May 4 20:50:24 piggy kernel: Code: 00 83 3d 99 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 19 c3 10 a0 48 8b 73 20 e8 06 1e 71 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10 May 4 20:50:24 piggy kernel: RSP: 0018:ffffc90003dc3df0 EFLAGS: 00010246 May 4 20:50:24 piggy kernel: RAX: 0000000000000000 RBX: ffff8881986cee08 RCX: 0000000000000000 May 4 20:50:24 piggy kernel: RDX: 0000000000000000 RSI: ffffffff828e59e0 RDI: ffff888106aa2c38 May 4 20:50:24 piggy kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 May 4 20:50:24 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164997110 May 4 20:50:24 piggy kernel: R13: ffff8881986cf000 R14: ffff8881986cf078 R15: ffff8881673452d8 May 4 20:50:24 piggy kernel: FS: 0000000000000000(0000) GS:ffff88907f400000(0000) knlGS:0000000000000000 May 4 20:50:24 piggy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 4 20:50:24 piggy kernel: CR2: 0000155417e48000 CR3: 00000001eaad2003 CR4: 0000000000770ef0 May 4 20:50:24 piggy kernel: PKRU: 55555554 May 4 20:50:24 piggy kernel: Call Trace: May 4 20:50:24 piggy kernel: <TASK> May 4 20:50:24 piggy kernel: md_thread+0x100/0x12e [md_mod] May 4 20:50:24 piggy kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20 May 4 20:50:24 piggy kernel: ? md_seq_show+0x720/0x720 [md_mod] May 4 20:50:24 piggy kernel: kthread+0xe4/0xef May 4 20:50:24 piggy kernel: ? kthread_complete_and_exit+0x1b/0x1b May 4 20:50:24 piggy kernel: ret_from_fork+0x1f/0x30 May 4 20:50:24 piggy kernel: </TASK> May 4 20:50:24 piggy kernel: Modules linked in: tun veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod tcp_diag inet_diag efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igc atlantic gigabyte_wmi wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd intel_cstate intel_uncore i2c_i801 i2c_smbus thunderbolt i915 iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper ahci libahci drm_kms_helper joydev input_leds led_class btusb drm btrtl btbcm btintel bluetooth mpt3sas intel_gtt nvme agpgart ecdh_generic i2c_core nvme_core ecc raid_class syscopyarea scsi_transport_sas sysfillrect sysimgblt fb_sys_fops thermal wmi fan tpm_crb tpm_tis tpm_tis_core video tpm backlight acpi_pad acpi_tad button unix May 4 20:50:24 piggy kernel: [last unloaded: igc] May 4 20:50:24 piggy kernel: ---[ end trace 0000000000000000 ]--- May 4 20:50:24 piggy kernel: RIP: 0010:unraidd+0x1051/0x1140 [md_mod] May 4 20:50:24 piggy kernel: Code: 00 83 3d 99 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 19 c3 10 a0 48 8b 73 20 e8 06 1e 71 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10 May 4 20:50:24 piggy kernel: RSP: 0018:ffffc90003dc3df0 EFLAGS: 00010246 May 4 20:50:24 piggy kernel: RAX: 0000000000000000 RBX: ffff8881986cee08 RCX: 0000000000000000 May 4 20:50:24 piggy kernel: RDX: 0000000000000000 RSI: ffffffff828e59e0 RDI: ffff888106aa2c38 May 4 20:50:24 piggy kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 May 4 20:50:24 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164997110 May 4 20:50:24 piggy kernel: R13: ffff8881986cf000 R14: ffff8881986cf078 R15: ffff8881673452d8 May 4 20:50:24 piggy kernel: FS: 0000000000000000(0000) GS:ffff88907f400000(0000) knlGS:0000000000000000 May 4 20:50:24 piggy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 4 20:50:24 piggy kernel: CR2: 0000155417e48000 CR3: 00000001eaad2004 CR4: 0000000000770ef0 May 4 20:50:24 piggy kernel: PKRU: 55555554 May 4 20:50:24 piggy kernel: ------------[ cut here ]------------ May 4 20:50:24 piggy kernel: WARNING: CPU: 0 PID: 10725 at kernel/exit.c:741 do_exit+0x39/0x8e5 May 4 20:50:24 piggy kernel: Modules linked in: tun veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod tcp_diag inet_diag efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igc atlantic gigabyte_wmi wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd intel_cstate intel_uncore i2c_i801 i2c_smbus thunderbolt i915 iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper ahci libahci drm_kms_helper joydev input_leds led_class btusb drm btrtl btbcm btintel bluetooth mpt3sas intel_gtt nvme agpgart ecdh_generic i2c_core nvme_core ecc raid_class syscopyarea scsi_transport_sas sysfillrect sysimgblt fb_sys_fops thermal wmi fan tpm_crb tpm_tis tpm_tis_core video tpm backlight acpi_pad acpi_tad button unix May 4 20:50:24 piggy kernel: [last unloaded: igc] May 4 20:50:24 piggy kernel: CPU: 0 PID: 10725 Comm: unraidd0 Tainted: G D 5.19.17-Unraid #2 May 4 20:50:24 piggy kernel: Hardware name: Gigabyte Technology Co., Ltd. Z690 AERO D/Z690 AERO D, BIOS F23a 01/04/2023 May 4 20:50:24 piggy kernel: RIP: 0010:do_exit+0x39/0x8e5 May 4 20:50:24 piggy kernel: Code: 89 fd 53 48 83 ec 28 65 48 8b 04 25 28 00 00 00 48 89 44 24 20 31 c0 65 48 8b 1c 25 c0 bb 01 00 48 83 bb a0 07 00 00 00 74 02 <0f> 0b 48 8b bb c8 06 00 00 e8 b7 c0 7c 00 48 8b 83 c0 06 00 00 83 May 4 20:50:24 piggy kernel: RSP: 0018:ffffc90003dc3ee0 EFLAGS: 00010286 May 4 20:50:24 piggy kernel: RAX: 0000000000000000 RBX: ffff888107eec000 RCX: 0000000000000000 May 4 20:50:24 piggy kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 000000000000000b May 4 20:50:24 piggy kernel: RBP: 000000000000000b R08: 0000000000000000 R09: ffffffff828653f0 May 4 20:50:24 piggy kernel: R10: 00003fffffffffff R11: ffff8890bfbc5fde R12: ffffc90003dc3d48 May 4 20:50:24 piggy kernel: R13: ffff888107eec000 R14: 0000000000000002 R15: ffffffff820b236d May 4 20:50:24 piggy kernel: FS: 0000000000000000(0000) GS:ffff88907f400000(0000) knlGS:0000000000000000 May 4 20:50:24 piggy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 4 20:50:24 piggy kernel: CR2: 0000155417e48000 CR3: 00000001eaad2004 CR4: 0000000000770ef0 May 4 20:50:24 piggy kernel: PKRU: 55555554 May 4 20:50:24 piggy kernel: Call Trace: May 4 20:50:24 piggy kernel: <TASK> May 4 20:50:24 piggy kernel: make_task_dead+0xba/0xba May 4 20:50:24 piggy kernel: rewind_stack_and_make_dead+0x17/0x17 May 4 20:50:24 piggy kernel: RIP: 0000:0x0 May 4 20:50:24 piggy kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. May 4 20:50:24 piggy kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 May 4 20:50:24 piggy kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 May 4 20:50:24 piggy kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 May 4 20:50:24 piggy kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 May 4 20:50:24 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 May 4 20:50:24 piggy kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 May 4 20:50:24 piggy kernel: </TASK> May 4 20:50:24 piggy kernel: ---[ end trace 0000000000000000 ]--- Quote Link to comment
JorgeB Posted May 6, 2023 Share Posted May 6, 2023 That suggests a hardware problem, RAM and/or board/CPU would be the main suspects. Quote Link to comment
bdarnell Posted May 10, 2023 Author Share Posted May 10, 2023 I swapped out the mb CPU ram with a known good system I started the parity rebuild again, lost count of how many times I've started it on the previous setup. Parity made it 13% then disk 3 created 1million errors. Disk 3 is new, has had 2x preclear cycles and has been installed for months with no issues. Syslog went from 200k to 4gb Is it my 2x lsi 9217-8i controllers? Quote Link to comment
bdarnell Posted May 10, 2023 Author Share Posted May 10, 2023 almost forgot the diagnostics...attached herepiggy-diagnostics-20230509-2219.zip Quote Link to comment
JorgeB Posted May 10, 2023 Share Posted May 10, 2023 Looks more like a power/connection problem, replace/swap cables and try again. Quote Link to comment
bdarnell Posted May 11, 2023 Author Share Posted May 11, 2023 I swapped the sata power cable with a known good cable, had the same exact result as before. This is a list of things not changed yet 2x lsi 9217-8i Power supply Corsair hx750 What else could it be? Or what parts would you recommend I use instead? I just want my system to be stable again. All these parts and wires work fine on my other system, the same exact parts. Quote Link to comment
bdarnell Posted May 11, 2023 Author Share Posted May 11, 2023 diagnostics attached piggy-diagnostics-20230510-2322.zip Quote Link to comment
JorgeB Posted May 11, 2023 Share Posted May 11, 2023 Did you also change the SATA cable? If the same disk failed with both new cables it suggests a disk problem. Quote Link to comment
bdarnell Posted May 19, 2023 Author Share Posted May 19, 2023 (edited) I think I have found my issue after swapping everything, cpu, mb, ram, every single data, signal, power cable, sas expander, hba card, psu, and wasting 5 months of my life, and lapsing return windows so I'm stuck with all this extra hardware. The WD 20tb drives use more power than the WD 10tb drives. My PSU that runs the case that only has harddrives is a Corsair RM750x. On the box it states that the 5v rail max is 20a and the 12v rail max is 62.5a. The WD 20tb drives state on the label that each drive needs [email protected] and [email protected] so when you multiply that by 15drives you get [email protected] and [email protected]. Both of which are well below the max of the PSU. But as I pulled out drives 1 by 1 and put them on another PSU, the errors slowly went away. My current data rebuild is finally progressing #10 of 15. On the RM750x PSU I have 7x 20tb drives and 5x 10tb drives and the remaining 3x 20tb on a separate PSU. If I add any more drives that the RM750x PSU, that is when I start getting errors, and if I swap data or power cables, it doesn't matter. It's when I lowered the power draw from the PSU is when the errors when away. I'll be looking for a new power supply that can hopefully run all 15x WD 20tb drives with out giving me errors. Do they make a device that can monitor the actual amperage draw on each PSU voltage rail? Edited May 19, 2023 by bdarnell Update psu model number Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.