Jump to content

HELP, made major upgrades now array restarts during Disk Rebuilding


Recommended Posts

GOAL, move unraid usb, disk drives, cache drive to a completely new setup. Then upgrade all the disk drives to 20tb shucked WD drives. Sell the 10tb drives online to recoup some costs.


Moving to the new setup went fine. No issues, parity good on the old setup and on the new setup, and all things started up as before. I have already precleared all the 14x 20tb drives prior to bringing them into this setup, again, no issues all passed. Upgrading all the disk drives is where I'm getting issues.


I have 14x 10tb drives in the primary DAS ...
  4x SFF-8087 to 4xSATA Forward Breakout 1.6ft
  to a LSI IBM 03X3834 16 Port PCI-e SAS Expander
  to 2xMini SAS SFF-8087 to SFF-8087 Cable, 100-Ohms, 1.6ft
  to Dual Ports Mini SAS SFF-8088 to SAS 36Pin SFF-8087 PCBA Female Adapter with PCI Bracket
  to 2xMini-SAS SFF-8088 to SFF-8088 Molex 2 Meter Cable
  to a LSI SAS9200-16e 16-Port External HBA Full-Height PCIe P20 IT Mode
    Connected to the first PCIe slot on my motherboard


I have 14x 20tb drives in the secondary DAS ...
  4x SFF-8087 to 4xSATA Forward Breakout 1.6ft
  to a LSI IBM 03X3834 16 Port PCI-e SAS Expander
  to 2xMini SAS SFF-8087 to SFF-8087 Cable, 100-Ohms, 1.6ft
  to Dual Ports Mini SAS SFF-8088 to SAS 36Pin SFF-8087 PCBA Female Adapter with PCI Bracket
  to 2xMini-SAS SFF-8088 to SFF-8088 Molex 2 Meter Cable
  to a LSI SAS9200-16e 16-Port External HBA Full-Height PCIe P20 IT Mode
    Connected to the second PCIe slot on my motherboard


Procedure to swap a drive...
  Prevent any unnecessary disk usages programs from running...
      Mover, Parity check, unmaniac, binhex-backup
  Stop the array
  Swap out on of the 10tb drives for a 20tb drive
  Start the array
  Rebuild starts automatically
  Once that drive completes, repeat until all the drives are up
  
Issues encountered
All the drives in the current array are spun up, and the transfer speeds are initially low (5-20MB/s per disk).  Then later they spin up to 205MB/s per disk. Then I get variations of disk speeds down to 5-20MB/s range, and sometimes the CPU get pegged at 100% for a few min, then returns to normal usage >10%, and the disk speeds increase again.  After some amount of time I come back to check on the progress, and the array has stopped, multiple drives are showing errors on the Main screen, multiple drives are showing elevated 199 UDMA CRC error counts, uptime is less than the time from when I started the swap drive process, and of course many texts from my friends and family that they cannot access plex.


What I've tried already
 Reseated all cables
 Replaced all cables with new ones
 Verified FW version are correct for the HBAs


I've attached diagnostics, as I cannot figure out what's going on.

piggy-diagnostics-20230316-1958.zip

Link to comment

I dont want to start the data rebuild until I have something I can change and test, because it does not complete and the array stops somewhere between 1hr and 20hrs into the rebuild.

But the Dashboard section will show errors during a rebuild, and the Main section will show errors on multiple drives...after a reboot those numbers are reset, and I don't see errors in the logs for those disks. (I attached images of these sections, they show 0 because its after a restart and I paused the rebuild immediately upon reboot)

What do you suggest I can try before I try another rebuild?

main drives.jpg

dashboard errors.JPG

Link to comment
  • 2 weeks later...

I've reduced from 15x10tb and 15x20tb drives to only 15 total drives, and physically swapping 1 drive at a time.

 

This has been working and when complete I will be putting everything into one case. This will reduce the number of potential failure points by removing extra cards, wires, power supplies. 

Link to comment
  • 2 weeks later...

Random docker soft crashing during data-rebuild, unraid gui works fine, cpus get pegged when the docker applications soft crash.... It really feels like something else is causing this to happen, but what else can I test?

I've reduced the number of wires by moving the case close to the drives.  Now from the motherboard, I have 2x LSI 9217-8I connected directly to the harddrives with 4x SFF-8087 to 4xSATA Forward Breakout 1.6ft

I have 9 drives swapped, working on number 10. But this one keeps stopping around 30-35% locking up the docker containers, preventing me from stopping or restarting docker applications.

piggy-diagnostics-20230427-2326.zip

Link to comment

I got this error message today....what does it mean"
 

Apr 30 02:12:49 piggy kernel: python3[24762]: segfault at 14bf167330 ip 000014bf1642763c sp 00007ffd3b138580 error 4 in libpython3.11.so.1.0[14bf162ee000+223000]
Apr 30 02:12:49 piggy kernel: Code: a8 00 00 00 01 e9 2b d5 ee ff 0f 1f 40 00 41 54 55 53 48 89 fb 48 83 ec 10 48 8b 57 f0 48 85 d2 0f 84 08 02 00 00 48 8b 7f f8 <4c> 8b 42 08 4c 8d 15 d9 ff ff ff 4c 8b 4b 08 48 83 e7 fc 41 83 e0

 

Link to comment

I'm still getting errors during Parity sync which is locking up my docker containers, bringing down plex

I've tried...
updating my BIOS to the latest version
updating to the latest rc version of Unraid.


What does this error mean?
 

May  5 00:56:42 piggy kernel: md: recovery thread: multiple disk errors, sector=992025512
May  5 00:56:42 piggy kernel: md: recovery thread: multiple disk errors, sector=992025512
May  5 00:57:58 piggy kernel: md: recovery thread: multiple disk errors, sector=1019867216
May  5 00:57:58 piggy kernel: md: recovery thread: multiple disk errors, sector=1019867216
May  5 00:58:15 piggy kernel: md: recovery thread: multiple disk errors, sector=1025975768
May  5 00:58:15 piggy kernel: ------------[ cut here ]------------
May  5 00:58:15 piggy kernel: kernel BUG at drivers/md/unraid.c:1617!
May  5 00:58:15 piggy kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
May  5 00:58:15 piggy kernel: CPU: 16 PID: 6813 Comm: unraidd0 Not tainted 6.1.27-Unraid #1
May  5 00:58:15 piggy kernel: Hardware name: Gigabyte Technology Co., Ltd. Z690 AERO D/Z690 AERO D, BIOS F24a 04/25/2023
May  5 00:58:15 piggy kernel: RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
May  5 00:58:15 piggy kernel: Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 c3 3e a0 48 8b 73 20 e8 ce b4 46 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
May  5 00:58:15 piggy kernel: RSP: 0018:ffffc90000f3fdf0 EFLAGS: 00010246
May  5 00:58:15 piggy kernel: RAX: 0000000000000000 RBX: ffff88813e2dee08 RCX: 0000000000000000
May  5 00:58:15 piggy kernel: RDX: 0000000000000000 RSI: ffffffff829e4f00 RDI: ffff8881012ce038
May  5 00:58:15 piggy kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000
May  5 00:58:15 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810bccb930
May  5 00:58:15 piggy kernel: R13: ffff88813e2df2c0 R14: ffff88813e2df338 R15: ffff88813e1195d8
May  5 00:58:15 piggy kernel: FS:  0000000000000000(0000) GS:ffff88907f800000(0000) knlGS:0000000000000000
May  5 00:58:15 piggy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  5 00:58:15 piggy kernel: CR2: 000014663af54000 CR3: 0000000733336005 CR4: 0000000000770ee0
May  5 00:58:15 piggy kernel: PKRU: 55555554
May  5 00:58:15 piggy kernel: Call Trace:
May  5 00:58:15 piggy kernel: <TASK>
May  5 00:58:15 piggy kernel: md_thread+0xf4/0x122 [md_mod]
May  5 00:58:15 piggy kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20
May  5 00:58:15 piggy kernel: ? signal_pending+0x1d/0x1d [md_mod]
May  5 00:58:15 piggy kernel: kthread+0xe4/0xef
May  5 00:58:15 piggy kernel: ? kthread_complete_and_exit+0x1b/0x1b
May  5 00:58:15 piggy kernel: ret_from_fork+0x1f/0x30
May  5 00:58:15 piggy kernel: </TASK>
May  5 00:58:15 piggy kernel: Modules linked in: tun xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igc atlantic i915 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm aesni_intel btusb crypto_simd btrtl cryptd btbcm mei_hdcp mei_pxp btintel i2c_i801 intel_gtt rapl intel_cstate gigabyte_wmi wmi_bmof bluetooth intel_uncore mpt3sas thunderbolt agpgart i2c_smbus nvme mei_me ahci input_leds i2c_core ecdh_generic joydev led_class mei nvme_core libahci raid_class ecc syscopyarea sysfillrect scsi_transport_sas sysimgblt thermal fb_sys_fops fan video tpm_crb tpm_tis tpm_tis_core wmi tpm
May  5 00:58:15 piggy kernel: backlight intel_pmc_core acpi_pad acpi_tad button unix [last unloaded: igc]
May  5 00:58:15 piggy kernel: ---[ end trace 0000000000000000 ]---
May  5 00:58:15 piggy kernel: RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
May  5 00:58:15 piggy kernel: Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 c3 3e a0 48 8b 73 20 e8 ce b4 46 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
May  5 00:58:15 piggy kernel: RSP: 0018:ffffc90000f3fdf0 EFLAGS: 00010246
May  5 00:58:15 piggy kernel: RAX: 0000000000000000 RBX: ffff88813e2dee08 RCX: 0000000000000000
May  5 00:58:15 piggy kernel: RDX: 0000000000000000 RSI: ffffffff829e4f00 RDI: ffff8881012ce038
May  5 00:58:15 piggy kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000
May  5 00:58:15 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810bccb930
May  5 00:58:15 piggy kernel: R13: ffff88813e2df2c0 R14: ffff88813e2df338 R15: ffff88813e1195d8
May  5 00:58:15 piggy kernel: FS:  0000000000000000(0000) GS:ffff88907f800000(0000) knlGS:0000000000000000
May  5 00:58:15 piggy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  5 00:58:15 piggy kernel: CR2: 000014663af54000 CR3: 0000000733336006 CR4: 0000000000770ee0
May  5 00:58:15 piggy kernel: PKRU: 55555554
May  5 00:58:15 piggy kernel: ------------[ cut here ]------------
May  5 00:58:15 piggy kernel: WARNING: CPU: 16 PID: 6813 at kernel/exit.c:814 do_exit+0x87/0x923
May  5 00:58:15 piggy kernel: Modules linked in: tun xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igc atlantic i915 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm aesni_intel btusb crypto_simd btrtl cryptd btbcm mei_hdcp mei_pxp btintel i2c_i801 intel_gtt rapl intel_cstate gigabyte_wmi wmi_bmof bluetooth intel_uncore mpt3sas thunderbolt agpgart i2c_smbus nvme mei_me ahci input_leds i2c_core ecdh_generic joydev led_class mei nvme_core libahci raid_class ecc syscopyarea sysfillrect scsi_transport_sas sysimgblt thermal fb_sys_fops fan video tpm_crb tpm_tis tpm_tis_core wmi tpm
May  5 00:58:15 piggy kernel: backlight intel_pmc_core acpi_pad acpi_tad button unix [last unloaded: igc]
May  5 00:58:15 piggy kernel: CPU: 16 PID: 6813 Comm: unraidd0 Tainted: G      D            6.1.27-Unraid #1
May  5 00:58:15 piggy kernel: Hardware name: Gigabyte Technology Co., Ltd. Z690 AERO D/Z690 AERO D, BIOS F24a 04/25/2023
May  5 00:58:15 piggy kernel: RIP: 0010:do_exit+0x87/0x923
May  5 00:58:15 piggy kernel: Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 51 40 80 00 48 83 bb 90 07 00 00 00 74 02 <0f> 0b 48 8b bb b8 06 00 00 e8 53 3f 80 00 48 8b 83 b0 06 00 00 83
May  5 00:58:15 piggy kernel: RSP: 0018:ffffc90000f3fee0 EFLAGS: 00010286
May  5 00:58:15 piggy kernel: RAX: 0000000080000000 RBX: ffff88810bd72f40 RCX: 0000000000000000
May  5 00:58:15 piggy kernel: RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
May  5 00:58:15 piggy kernel: RBP: 000000000000000b R08: 0000000000000000 R09: ffffffff8294b3f0
May  5 00:58:15 piggy kernel: R10: 00003fffffffffff R11: ffff8890bfbc3f6e R12: ffff8881043ee000
May  5 00:58:15 piggy kernel: R13: ffff88813f1e3180 R14: 0000000000000002 R15: ffffffff82069847
May  5 00:58:15 piggy kernel: FS:  0000000000000000(0000) GS:ffff88907f800000(0000) knlGS:0000000000000000
May  5 00:58:15 piggy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  5 00:58:15 piggy kernel: CR2: 000014663af54000 CR3: 0000000733336006 CR4: 0000000000770ee0
May  5 00:58:15 piggy kernel: PKRU: 55555554
May  5 00:58:15 piggy kernel: Call Trace:
May  5 00:58:15 piggy kernel: <TASK>
May  5 00:58:15 piggy kernel: make_task_dead+0x11c/0x11c
May  5 00:58:15 piggy kernel: rewind_stack_and_make_dead+0x17/0x17
May  5 00:58:15 piggy kernel: RIP: 0000:0x0
May  5 00:58:15 piggy kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
May  5 00:58:15 piggy kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
May  5 00:58:15 piggy kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
May  5 00:58:15 piggy kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
May  5 00:58:15 piggy kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
May  5 00:58:15 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
May  5 00:58:15 piggy kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
May  5 00:58:15 piggy kernel: </TASK>
May  5 00:58:15 piggy kernel: ---[ end trace 0000000000000000 ]---

 

Link to comment

That error message came from the unRAID version v6.12-rc5

 

If I don't run parity sync or drive rebuilds, then system is stable. 

 

If it's an hardware issue... Are the recommendations for disabling things in BIOS (gigabyte aero d with i9-13900k). All the parts in this build are brand new and cables have been all replaced with brand new. What items should I look at? Should I completely unplug everything from the motherboard/power supply/harddrive and reseat every thing? Should I stress test the CPU and memory modules? Would the nvme drives cause this issue? Does anyone else have issues with this combination of CPU/motherboard? 

Link to comment

Yes is the same scenario when using v6.11.5

 

Here is an error from that version

May  4 20:50:24 piggy kernel: md: recovery thread: multiple disk errors, sector=7498539328
May  4 20:50:24 piggy kernel: ------------[ cut here ]------------
May  4 20:50:24 piggy kernel: kernel BUG at drivers/md/unraid.c:1617!
May  4 20:50:24 piggy kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
May  4 20:50:24 piggy kernel: CPU: 0 PID: 10725 Comm: unraidd0 Not tainted 5.19.17-Unraid #2
May  4 20:50:24 piggy kernel: Hardware name: Gigabyte Technology Co., Ltd. Z690 AERO D/Z690 AERO D, BIOS F23a 01/04/2023
May  4 20:50:24 piggy kernel: RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
May  4 20:50:24 piggy kernel: Code: 00 83 3d 99 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 19 c3 10 a0 48 8b 73 20 e8 06 1e 71 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
May  4 20:50:24 piggy kernel: RSP: 0018:ffffc90003dc3df0 EFLAGS: 00010246
May  4 20:50:24 piggy kernel: RAX: 0000000000000000 RBX: ffff8881986cee08 RCX: 0000000000000000
May  4 20:50:24 piggy kernel: RDX: 0000000000000000 RSI: ffffffff828e59e0 RDI: ffff888106aa2c38
May  4 20:50:24 piggy kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
May  4 20:50:24 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164997110
May  4 20:50:24 piggy kernel: R13: ffff8881986cf000 R14: ffff8881986cf078 R15: ffff8881673452d8
May  4 20:50:24 piggy kernel: FS:  0000000000000000(0000) GS:ffff88907f400000(0000) knlGS:0000000000000000
May  4 20:50:24 piggy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  4 20:50:24 piggy kernel: CR2: 0000155417e48000 CR3: 00000001eaad2003 CR4: 0000000000770ef0
May  4 20:50:24 piggy kernel: PKRU: 55555554
May  4 20:50:24 piggy kernel: Call Trace:
May  4 20:50:24 piggy kernel: <TASK>
May  4 20:50:24 piggy kernel: md_thread+0x100/0x12e [md_mod]
May  4 20:50:24 piggy kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20
May  4 20:50:24 piggy kernel: ? md_seq_show+0x720/0x720 [md_mod]
May  4 20:50:24 piggy kernel: kthread+0xe4/0xef
May  4 20:50:24 piggy kernel: ? kthread_complete_and_exit+0x1b/0x1b
May  4 20:50:24 piggy kernel: ret_from_fork+0x1f/0x30
May  4 20:50:24 piggy kernel: </TASK>
May  4 20:50:24 piggy kernel: Modules linked in: tun veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod tcp_diag inet_diag efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igc atlantic gigabyte_wmi wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd intel_cstate intel_uncore i2c_i801 i2c_smbus thunderbolt i915 iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper ahci libahci drm_kms_helper joydev input_leds led_class btusb drm btrtl btbcm btintel bluetooth mpt3sas intel_gtt nvme agpgart ecdh_generic i2c_core nvme_core ecc raid_class syscopyarea scsi_transport_sas sysfillrect sysimgblt fb_sys_fops thermal wmi fan tpm_crb tpm_tis tpm_tis_core video tpm backlight acpi_pad acpi_tad button unix
May  4 20:50:24 piggy kernel: [last unloaded: igc]
May  4 20:50:24 piggy kernel: ---[ end trace 0000000000000000 ]---
May  4 20:50:24 piggy kernel: RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
May  4 20:50:24 piggy kernel: Code: 00 83 3d 99 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 19 c3 10 a0 48 8b 73 20 e8 06 1e 71 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
May  4 20:50:24 piggy kernel: RSP: 0018:ffffc90003dc3df0 EFLAGS: 00010246
May  4 20:50:24 piggy kernel: RAX: 0000000000000000 RBX: ffff8881986cee08 RCX: 0000000000000000
May  4 20:50:24 piggy kernel: RDX: 0000000000000000 RSI: ffffffff828e59e0 RDI: ffff888106aa2c38
May  4 20:50:24 piggy kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
May  4 20:50:24 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164997110
May  4 20:50:24 piggy kernel: R13: ffff8881986cf000 R14: ffff8881986cf078 R15: ffff8881673452d8
May  4 20:50:24 piggy kernel: FS:  0000000000000000(0000) GS:ffff88907f400000(0000) knlGS:0000000000000000
May  4 20:50:24 piggy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  4 20:50:24 piggy kernel: CR2: 0000155417e48000 CR3: 00000001eaad2004 CR4: 0000000000770ef0
May  4 20:50:24 piggy kernel: PKRU: 55555554
May  4 20:50:24 piggy kernel: ------------[ cut here ]------------
May  4 20:50:24 piggy kernel: WARNING: CPU: 0 PID: 10725 at kernel/exit.c:741 do_exit+0x39/0x8e5
May  4 20:50:24 piggy kernel: Modules linked in: tun veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod tcp_diag inet_diag efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igc atlantic gigabyte_wmi wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd intel_cstate intel_uncore i2c_i801 i2c_smbus thunderbolt i915 iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper ahci libahci drm_kms_helper joydev input_leds led_class btusb drm btrtl btbcm btintel bluetooth mpt3sas intel_gtt nvme agpgart ecdh_generic i2c_core nvme_core ecc raid_class syscopyarea scsi_transport_sas sysfillrect sysimgblt fb_sys_fops thermal wmi fan tpm_crb tpm_tis tpm_tis_core video tpm backlight acpi_pad acpi_tad button unix
May  4 20:50:24 piggy kernel: [last unloaded: igc]
May  4 20:50:24 piggy kernel: CPU: 0 PID: 10725 Comm: unraidd0 Tainted: G      D           5.19.17-Unraid #2
May  4 20:50:24 piggy kernel: Hardware name: Gigabyte Technology Co., Ltd. Z690 AERO D/Z690 AERO D, BIOS F23a 01/04/2023
May  4 20:50:24 piggy kernel: RIP: 0010:do_exit+0x39/0x8e5
May  4 20:50:24 piggy kernel: Code: 89 fd 53 48 83 ec 28 65 48 8b 04 25 28 00 00 00 48 89 44 24 20 31 c0 65 48 8b 1c 25 c0 bb 01 00 48 83 bb a0 07 00 00 00 74 02 <0f> 0b 48 8b bb c8 06 00 00 e8 b7 c0 7c 00 48 8b 83 c0 06 00 00 83
May  4 20:50:24 piggy kernel: RSP: 0018:ffffc90003dc3ee0 EFLAGS: 00010286
May  4 20:50:24 piggy kernel: RAX: 0000000000000000 RBX: ffff888107eec000 RCX: 0000000000000000
May  4 20:50:24 piggy kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 000000000000000b
May  4 20:50:24 piggy kernel: RBP: 000000000000000b R08: 0000000000000000 R09: ffffffff828653f0
May  4 20:50:24 piggy kernel: R10: 00003fffffffffff R11: ffff8890bfbc5fde R12: ffffc90003dc3d48
May  4 20:50:24 piggy kernel: R13: ffff888107eec000 R14: 0000000000000002 R15: ffffffff820b236d
May  4 20:50:24 piggy kernel: FS:  0000000000000000(0000) GS:ffff88907f400000(0000) knlGS:0000000000000000
May  4 20:50:24 piggy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  4 20:50:24 piggy kernel: CR2: 0000155417e48000 CR3: 00000001eaad2004 CR4: 0000000000770ef0
May  4 20:50:24 piggy kernel: PKRU: 55555554
May  4 20:50:24 piggy kernel: Call Trace:
May  4 20:50:24 piggy kernel: <TASK>
May  4 20:50:24 piggy kernel: make_task_dead+0xba/0xba
May  4 20:50:24 piggy kernel: rewind_stack_and_make_dead+0x17/0x17
May  4 20:50:24 piggy kernel: RIP: 0000:0x0
May  4 20:50:24 piggy kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
May  4 20:50:24 piggy kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
May  4 20:50:24 piggy kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
May  4 20:50:24 piggy kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
May  4 20:50:24 piggy kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
May  4 20:50:24 piggy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
May  4 20:50:24 piggy kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
May  4 20:50:24 piggy kernel: </TASK>
May  4 20:50:24 piggy kernel: ---[ end trace 0000000000000000 ]---

 

Link to comment

I swapped out the mb CPU ram with a known good system

 

I started the parity rebuild again, lost count of how many times I've started it on the previous setup.

 

Parity made it 13% then disk 3 created 1million errors. Disk 3 is new, has had 2x preclear cycles and has been installed for months with no issues.

 

Syslog went from 200k to 4gb

 

Is it my 2x  lsi 9217-8i controllers? 

 

PXL_20230510_015652477.jpg

Link to comment

I swapped the sata power cable with a known good cable, had the same exact result as before.

 

This is a list of things not changed yet

2x lsi 9217-8i

Power supply Corsair hx750

 

What else could it be? Or what parts would you recommend I use instead? I just want my system to be stable again. 

 

All these parts and wires work fine on my other system, the same exact parts.

Link to comment
  • 2 weeks later...

I think I have found my issue after swapping everything, cpu, mb, ram, every single data, signal, power cable, sas expander, hba card, psu, and wasting 5 months of my life, and lapsing return windows so I'm stuck with all this extra hardware.

The WD 20tb drives use more power than the WD 10tb drives.  My PSU that runs the case that only has harddrives is a Corsair RM750x.  On the box it states that the 5v rail max is 20a and the 12v rail max is 62.5a.  The WD 20tb drives state on the label that each drive needs [email protected] and [email protected]  so when you multiply that by 15drives you get [email protected] and [email protected]. Both of which are well below the max of the PSU. But as I pulled out drives 1 by 1 and put them on another PSU, the errors slowly went away.

My current data rebuild is finally progressing #10 of 15.  On the RM750x PSU I have 7x 20tb drives and 5x 10tb drives and the remaining 3x 20tb on a separate PSU.  If I add any more drives that the RM750x PSU, that is when I start getting errors, and if I swap data or power cables, it doesn't matter.  It's when I lowered the power draw from the PSU is when the errors when away.

I'll be looking for a new power supply that can hopefully run all 15x WD 20tb drives with out giving me errors.  Do they make a device that can monitor the actual amperage draw on each PSU voltage rail?

Edited by bdarnell
Update psu model number
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...