November 23, 20169 yr I was doing a transfer from vm on cache to array when vm stopped responding. looked at array and saw disk 5 with an X. figured I'd do what worked in the past, stop array, unassigned disk 5, start array. stop array reassign disk 5 and let it rebuild. thats when disk 4 went red X too. so i stopped. this happened not too long ago but it was disk 6 that was not found and a rebuild cause disk 5 to error. Last time it was a flaky cable or something so I switched disk 4 ad 5 to a data controller in a pic slot and i still see the same problem. disk 5 looks like it's wanting to rebuild and disk 4 can't be found. I went into maintenance mode to chkdsk but disk 4 and 5 say I/O error. I was going to pull disks 4 and 5 and throw them into a cradle and see if they spin up and are accessible to windows and then put them back. I'm getting tired of juggling so many disks I'm thinking of replacing them all with ten 8TB drives just to streamline it a little. attached is my diag and other logs. any guidance on how to best proceed is most appreciated. Feedback unRAID Server Pro Server Description Version UptimeTower • 10.1.25.13 Media server 6.2.4 16 hours, 56 minutes Dashboard Main Shares Users Settings Plugins Apps Tools Feedback Help Info Log System Devices PCI Devices 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor DRAM Controller [8086:0c00] (rev 06) 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06) 00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller [8086:0c05] (rev 06) 00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller [8086:0412] (rev 06) 00:03.0 Audio device [0403]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller [8086:0c0c] (rev 06) 00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 05) 00:16.0 Communication controller [0780]: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 [8086:8c3a] (rev 04) 00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection I217-V [8086:153b] (rev 05) 00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 05) 00:1b.0 Audio device [0403]: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller [8086:8c20] (rev 05) 00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 [8086:8c10] (rev d5) 00:1c.4 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 [8086:8c18] (rev d5) 00:1c.5 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #6 [8086:8c1a] (rev d5) 00:1c.6 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #7 [8086:8c1c] (rev d5) 00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 05) 00:1f.0 ISA bridge [0601]: Intel Corporation Z87 Express LPC Controller [8086:8c44] (rev 05) 00:1f.2 SATA controller [0106]: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8086:8c02] (rev 05) 00:1f.3 SMBus [0c05]: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller [8086:8c22] (rev 05) 01:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:8747] (rev ba) 02:08.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:8747] (rev ba) 02:10.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:8747] (rev ba) 02:11.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:8747] (rev ba) 05:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02) 06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT218 [NVS 300] [10de:10d8] (rev a2) 06:00.1 Audio device [0403]: NVIDIA Corporation High Definition Audio Controller [10de:0be3] (rev a1) 72:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03) 73:00.0 Network controller [0280]: Broadcom Limited BCM4352 802.11ac Wireless Network Adapter [14e4:43b1] (rev 03) 74:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 75:01.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 75:02.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 75:03.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 77:00.0 SATA controller [0106]: Marvell Technology Group Ltd. Device [1b4b:9215] (rev 11) IOMMU Groups Warning: Your system has booted with the PCIe ACS Override setting enabled. The below list doesn't not reflect the way IOMMU would naturally group devices. To see natural IOMMU groups for your hardware, go to the VM Settings page and set the PCIe ACS Override setting to No. /sys/kernel/iommu_groups/0/devices/0000:00:00.0 /sys/kernel/iommu_groups/1/devices/0000:00:01.0 /sys/kernel/iommu_groups/2/devices/0000:00:01.1 /sys/kernel/iommu_groups/3/devices/0000:00:02.0 /sys/kernel/iommu_groups/4/devices/0000:00:03.0 /sys/kernel/iommu_groups/5/devices/0000:00:14.0 /sys/kernel/iommu_groups/6/devices/0000:00:16.0 /sys/kernel/iommu_groups/7/devices/0000:00:19.0 /sys/kernel/iommu_groups/8/devices/0000:00:1a.0 /sys/kernel/iommu_groups/9/devices/0000:00:1b.0 /sys/kernel/iommu_groups/10/devices/0000:00:1c.0 /sys/kernel/iommu_groups/11/devices/0000:00:1c.4 /sys/kernel/iommu_groups/12/devices/0000:00:1c.5 /sys/kernel/iommu_groups/13/devices/0000:00:1c.6 /sys/kernel/iommu_groups/14/devices/0000:00:1d.0 /sys/kernel/iommu_groups/15/devices/0000:00:1f.0 /sys/kernel/iommu_groups/15/devices/0000:00:1f.2 /sys/kernel/iommu_groups/15/devices/0000:00:1f.3 /sys/kernel/iommu_groups/16/devices/0000:01:00.0 /sys/kernel/iommu_groups/17/devices/0000:02:08.0 /sys/kernel/iommu_groups/18/devices/0000:02:10.0 /sys/kernel/iommu_groups/19/devices/0000:02:11.0 /sys/kernel/iommu_groups/20/devices/0000:05:00.0 /sys/kernel/iommu_groups/21/devices/0000:06:00.0 /sys/kernel/iommu_groups/21/devices/0000:06:00.1 /sys/kernel/iommu_groups/22/devices/0000:72:00.0 /sys/kernel/iommu_groups/23/devices/0000:73:00.0 /sys/kernel/iommu_groups/24/devices/0000:74:00.0 /sys/kernel/iommu_groups/25/devices/0000:75:01.0 /sys/kernel/iommu_groups/26/devices/0000:75:02.0 /sys/kernel/iommu_groups/27/devices/0000:75:03.0 /sys/kernel/iommu_groups/28/devices/0000:77:00.0 CPU Thread Pairings cpu 0 <===> cpu 4 cpu 1 <===> cpu 5 cpu 2 <===> cpu 6 cpu 3 <===> cpu 7 USB Devices Bus 002 Device 002: ID 8087:8000 Intel Corp. Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 13d3:3404 IMC Networks Bus 001 Device 002: ID 8087:8008 Intel Corp. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 003: ID 174c:3074 ASMedia Technology Inc. ASM1074 SuperSpeed hub Bus 004 Device 002: ID 174c:3074 ASMedia Technology Inc. ASM1074 SuperSpeed hub Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 004: ID 1b1c:0a10 Corsair Bus 003 Device 003: ID 174c:2074 ASMedia Technology Inc. ASM1074 High-Speed hub Bus 003 Device 002: ID 174c:2074 ASMedia Technology Inc. ASM1074 High-Speed hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub SCSI Devices [0:0:0:0] disk Corsair UFD 1100 /dev/sda 15.7GB [1:0:0:0] disk ATA ST4000DM000-1F21 CC52 /dev/sdi 4.00TB [1:0:1:0] disk ATA WDC WD30EZRX-00M 0A80 /dev/sdj 3.00TB [1:0:2:0] disk ATA WDC WD30EZRX-00M 0A80 /dev/sdk 3.00TB [1:0:3:0] disk ATA ST5000DM000-1FK1 CC47 /dev/sdl 5.00TB [1:0:4:0] disk ATA Hitachi HDS5C404 A3B0 /dev/sdm 4.00TB [1:0:5:0] disk ATA ST5000DM000-1FK1 CC44 /dev/sdn 5.00TB [1:0:6:0] disk ATA ST5000DM000-1FK1 CC49 /dev/sdo 5.00TB [1:0:7:0] disk ATA ST4000DM000-1F21 CC54 /dev/sdp 4.00TB [1:0:8:0] disk ATA ST4000DM000-1F21 CC52 /dev/sdq 4.00TB [1:0:9:0] disk ATA WDC WD40EZRX-00S 0A80 /dev/sdr 4.00TB [1:0:10:0] disk ATA Hitachi HDS5C404 A3B0 /dev/sds 4.00TB [1:0:11:0] disk ATA ST3000DM001-9YN1 CC9D /dev/sdt 3.00TB [1:0:12:0] disk ATA WDC WD30EZRS-00J 0A80 /dev/sdu 3.00TB [1:0:13:0] disk ATA WDC WD60EZRX-00M 0A80 /dev/sdv 6.00TB [1:0:14:0] disk ATA WDC WD30EZRS-00J 0A80 /dev/sdw 3.00TB [1:0:15:0] disk ATA WDC WD30EZRS-00J 0A80 /dev/sdx 3.00TB [1:0:16:0] enclosu LSI 3x24R_02.0.0.1 0200 - - [2:0:0:0] disk ATA CT500BX100SSD1 MU02 /dev/sdb 500GB [3:0:0:0] disk ATA CT500BX100SSD1 MU02 /dev/sdc 500GB [4:0:0:0] disk ATA WDC WD40EZRX-00S 0A80 /dev/sdd 4.00TB [7:0:0:0] disk ATA ST4000DM000-1F21 CC54 /dev/sde 4.00TB [8:0:0:0] disk ATA WDC WD40EZRX-00S 0A80 /dev/sdf 4.00TB [10:0:0:0] disk ATA WDC WD40EZRX-00S 0A80 /dev/sdg 4.00TB [11:0:0:0] disk ATA TOSHIBA DT01ACA3 ABB0 /dev/sdh 3.00TB Array Stopped• Dynamix webGui v2016.11.05unRAID webGui 2016, Lime Technology, Inc. manual tower-diagnostics-20161123-1543.zip
November 25, 20169 yr Author I spun up the drives in an external and they spun up and sounded normal. I reworded all power and data and still getting same config. How do I start the array without a rebuild happening?
November 26, 20169 yr Author Anybody got an idea? I'm afraid if I start the array a rebuild will lose data.
November 26, 20169 yr Community Expert Server was rebooted, logs don't show what happened, but SMART for disk 4 looks OK, but disk 5 needs to be replaced, not rebuilt, do u have a spare? Model Family: Western Digital Green Device Model: WDC WD30EZRX-00MMMB0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 8 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 8
November 26, 20169 yr Author i have a spare i can put in. i'll swap it and start the array and see how it goes.
November 26, 20169 yr Community Expert You have 1 invalid and 1 disable disk, you cant do normal rebuild (maybe consider adding a 2nd parity in future) If I understood correctly, and disk4 redballed during disk5 rebuild, you can try this: -take a screenshot of current array -tools -> new config -assign all disks to the original slots, including old disk5, double check all assignments -very important, check the box "parity is already valid" before starting array -start array -stop array -powerdown and replace disk5 with the spare -powerup, assign the spare and start array to begin rebuild Keep old disk5 intact in case it's needed.
November 26, 20169 yr Author I did the new config and told it to keep all disks. the array turned good and I am able to play files from both disk 4 and 5. not sure what if anything is missing yet. vm is hanging. i can vnc into it but just see my widows splash screen. won't load into windows yet.
November 26, 20169 yr Community Expert Disk5 should still be replaced,or at least do an extended SMART test to check if the pending sectors are real or false positives.
November 27, 20169 yr Author The rebuild started but now disk 9 10 11 12 showing 41 millions read errors. tower-diagnostics-20161126-2032.zip
November 27, 20169 yr Community Expert You'll want to check what those 4 disks have in common, cable, backplane, etc.
November 27, 20169 yr Author I have a lot of y splitters to power all the drives. I think I may be overloading them. Time to get out of this tower into a hot swap rack. Thanks a lot for your advice. I run 3 unraid servers with different hardware and each machine has its quirks. One I swear I have to do a special dance to get it to come up without a USB overpower error. Apparently if I screw the cards down I get an error, if I remove the screws it boots up fine, shirt in the case somewhere? It's tough keeping so many drives spinning. Can't wait for the 100tb ssd. Also I never leave well enough alone. I feel like a slave to my nas overlords. I heal them and grow them and never question what they will do with all that power....
December 1, 20169 yr Author I took the drive that failed out and ran a preclear and an extended smart test and both came back totally clean. does that mean the drive is safe to use? should I clear it again to be sure?
Archived
This topic is now archived and is closed to further replies.