Jump to content

Help! Hardware Failure? Array Currently Unprotected!


DaveHavok

Recommended Posts

Hello everyone! I'm currently in the process of troubleshooting my unRAID server that's currently unprotected due to a missing drive.

 

PROBLEM:

- One of my drives (Drive 5) suddenly was having problems reading during a monthly parity check. 

- Drive 5 was marked with a red X and listed as "Faulty"

- Trying a potential cheap fix, I swapped SATA cables

- Drive 5 passed a few SMART checks

- I then did a parity sync / rebuild of the same Drive 5

- Rebuild was successful (2 days later)

- I then attempted a parity check and immediately Drive 5 was generating read errors

- I immediately stopped the parity check before any writes to the parity drive could be done.

- I swapped in a brand new drive in the same slot to begin preclearing the drive

- I had to power on/off the server a few times to get it to see the drive in the preclear menu and in the device listings

- The preclear is insanely slow at > 1MB /s

 

Looking at the system log, I see this same error generating repeatedly:

 

Dec  6 23:14:01 OrigamiNET emhttp: shcmd (426): rmmod md-mod |& logger
Dec  6 23:14:01 OrigamiNET kernel: md: unRAID driver removed
Dec  6 23:14:01 OrigamiNET emhttp: shcmd (427): modprobe md-mod super=/boot/config/super.dat |& logger
Dec  6 23:14:01 OrigamiNET kernel: md: unRAID driver 2.6.8 installed
Dec  6 23:14:01 OrigamiNET emhttp: err: get_key_info: get_message: /boot/config/._Pro.key (-3)
Dec  6 23:14:01 OrigamiNET emhttp: Pro key detected, GUID: 03F0-5307-0000-0000000003F6 FILE: /boot/config/Pro.key
Dec  6 23:14:01 OrigamiNET emhttp: Device inventory:
Dec  6 23:14:01 OrigamiNET emhttp: shcmd (428): udevadm settle
Dec  6 23:14:01 OrigamiNET emhttp: hp_v165w_00000000000003F6-0:0 (sda) 3946464
Dec  6 23:14:01 OrigamiNET emhttp: BP4_mSATA_SSD_FECA07411CEC00143435 (sdq) 117220792
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1CH166_W1F572BT (sdb) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST8000AS0002-1NA17Z_Z840A6S4 (sdc) 7814026532
Dec  6 23:14:01 OrigamiNET emhttp: ST8000AS0002-1NA17Z_Z840BT8X (sdd) 7814026532
Dec  6 23:14:01 OrigamiNET emhttp: ST8000AS0002-1NA17Z_Z840JYK2 (sde) 7814026532
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1ER166_Z5005CNJ (sdf) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1CH166_W1F43J37 (sdg) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1CH166_W1F42CT4 (sdh) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1CH166_W1F29MZT (sdi) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1CH166_W1F4TTZC (sdj) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1ER166_Z500MXRB (sdk) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1ER166_Z5005EVB (sdl) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1ER166_Z5005FV5 (sdm) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST8000AS0002-1NA17Z_Z840SL8R (sdn) 7814026532
Dec  6 23:14:01 OrigamiNET emhttp: ST3000DM001-1CH166_W1F28VK2 (sdo) 2930266532
Dec  6 23:14:01 OrigamiNET emhttp: ST8000AS0002-1NA17Z_Z840A1MT (sdp) 7814026532
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (1): import 0 sdp 7814026532 0 ST8000AS0002-1NA17Z_Z840A1MT
Dec  6 23:14:01 OrigamiNET kernel: md: import disk0: (sdp) ST8000AS0002-1NA17Z_Z840A1MT size: 7814026532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (2): import 1 sdi 2930266532 0 ST3000DM001-1CH166_W1F29MZT
Dec  6 23:14:01 OrigamiNET kernel: md: import disk1: (sdi) ST3000DM001-1CH166_W1F29MZT size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (3): import 2 sdh 2930266532 0 ST3000DM001-1CH166_W1F42CT4
Dec  6 23:14:01 OrigamiNET kernel: md: import disk2: (sdh) ST3000DM001-1CH166_W1F42CT4 size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (4): import 3 sdg 2930266532 0 ST3000DM001-1CH166_W1F43J37
Dec  6 23:14:01 OrigamiNET kernel: md: import disk3: (sdg) ST3000DM001-1CH166_W1F43J37 size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (5): import 4 sdf 2930266532 0 ST3000DM001-1ER166_Z5005CNJ
Dec  6 23:14:01 OrigamiNET kernel: md: import disk4: (sdf) ST3000DM001-1ER166_Z5005CNJ size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (6): import 5
Dec  6 23:14:01 OrigamiNET kernel: md: import_slot: 5 empty
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (7): import 6 sdm 2930266532 0 ST3000DM001-1ER166_Z5005FV5
Dec  6 23:14:01 OrigamiNET kernel: md: import disk6: (sdm) ST3000DM001-1ER166_Z5005FV5 size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (: import 7 sdl 2930266532 0 ST3000DM001-1ER166_Z5005EVB
Dec  6 23:14:01 OrigamiNET kernel: md: import disk7: (sdl) ST3000DM001-1ER166_Z5005EVB size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (9): import 8 sdk 2930266532 0 ST3000DM001-1ER166_Z500MXRB
Dec  6 23:14:01 OrigamiNET kernel: md: import disk8: (sdk) ST3000DM001-1ER166_Z500MXRB size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (10): import 9 sdj 2930266532 0 ST3000DM001-1CH166_W1F4TTZC
Dec  6 23:14:01 OrigamiNET kernel: md: import disk9: (sdj) ST3000DM001-1CH166_W1F4TTZC size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (11): import 10 sdb 2930266532 0 ST3000DM001-1CH166_W1F572BT
Dec  6 23:14:01 OrigamiNET kernel: md: import disk10: (sdb) ST3000DM001-1CH166_W1F572BT size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (12): import 11 sdo 2930266532 0 ST3000DM001-1CH166_W1F28VK2
Dec  6 23:14:01 OrigamiNET kernel: md: import disk11: (sdo) ST3000DM001-1CH166_W1F28VK2 size: 2930266532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (13): import 12 sdc 7814026532 0 ST8000AS0002-1NA17Z_Z840A6S4
Dec  6 23:14:01 OrigamiNET kernel: md: import disk12: (sdc) ST8000AS0002-1NA17Z_Z840A6S4 size: 7814026532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (14): import 13 sdd 7814026532 0 ST8000AS0002-1NA17Z_Z840BT8X
Dec  6 23:14:01 OrigamiNET kernel: md: import disk13: (sdd) ST8000AS0002-1NA17Z_Z840BT8X size: 7814026532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (15): import 14 sde 7814026532 0 ST8000AS0002-1NA17Z_Z840JYK2
Dec  6 23:14:01 OrigamiNET kernel: md: import disk14: (sde) ST8000AS0002-1NA17Z_Z840JYK2 size: 7814026532 
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (16): import 15
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (17): import 16
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (18): import 17
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (19): import 18
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (20): import 19
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (21): import 20
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (22): import 21
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (23): import 22
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (24): import 23
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (25): import 24
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (26): import 25
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (27): import 26
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (28): import 27
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (29): import 28
Dec  6 23:14:01 OrigamiNET kernel: mdcmd (30): import 29
Dec  6 23:14:01 OrigamiNET kernel: md: import_slot: 29 empty
Dec  6 23:14:01 OrigamiNET emhttp: import 30 cache device: sdq
Dec  6 23:14:01 OrigamiNET emhttp: import flash device: sda
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: exception Emask 0x40 SAct 0x800000 SErr 0x880800 action 0x6 frozen
Dec  6 23:14:29 OrigamiNET kernel: ata5: SError: { HostInt 10B8B LinkSeq }
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: cmd 60/00:b8:20:76:10/01:00:00:00:00/40 tag 23 ncq 131072 in
Dec  6 23:14:29 OrigamiNET kernel:         res 40/00:c0:20:14:10/00:00:00:00:00/40 Emask 0x44 (timeout)
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: status: { DRDY }
Dec  6 23:14:29 OrigamiNET kernel: ata5: hard resetting link
Dec  6 23:14:29 OrigamiNET kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec  6 23:14:29 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:29 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:29 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:29 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: configured for UDMA/33
Dec  6 23:14:29 OrigamiNET kernel: ata5: EH complete
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: exception Emask 0x50 SAct 0x20 SErr 0x280900 action 0x6 frozen
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Dec  6 23:14:29 OrigamiNET kernel: ata5: SError: { UnrecovData HostInt 10B8B BadCRC }
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: cmd 60/00:28:20:76:10/01:00:00:00:00/40 tag 5 ncq 131072 in
Dec  6 23:14:29 OrigamiNET kernel:         res 40/00:28:20:76:10/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: status: { DRDY }
Dec  6 23:14:29 OrigamiNET kernel: ata5: hard resetting link
Dec  6 23:14:29 OrigamiNET kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec  6 23:14:29 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:29 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:29 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:29 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: configured for UDMA/33
Dec  6 23:14:29 OrigamiNET kernel: ata5: EH complete
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: exception Emask 0x50 SAct 0x60000000 SErr 0x280900 action 0x6 frozen
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Dec  6 23:14:29 OrigamiNET kernel: ata5: SError: { UnrecovData HostInt 10B8B BadCRC }
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: cmd 60/00:e8:20:7a:10/01:00:00:00:00/40 tag 29 ncq 131072 in
Dec  6 23:14:29 OrigamiNET kernel:         res 40/00:e8:20:7a:10/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: status: { DRDY }
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: cmd 60/00:f0:20:7b:10/01:00:00:00:00/40 tag 30 ncq 131072 in
Dec  6 23:14:29 OrigamiNET kernel:         res 40/00:e8:20:7a:10/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Dec  6 23:14:29 OrigamiNET kernel: ata5.00: status: { DRDY }
Dec  6 23:14:29 OrigamiNET kernel: ata5: hard resetting link
Dec  6 23:14:30 OrigamiNET kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: configured for UDMA/33
Dec  6 23:14:30 OrigamiNET kernel: ata5: EH complete
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: exception Emask 0x50 SAct 0xc000 SErr 0x280900 action 0x6 frozen
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Dec  6 23:14:30 OrigamiNET kernel: ata5: SError: { UnrecovData HostInt 10B8B BadCRC }
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: cmd 60/00:70:20:7d:10/01:00:00:00:00/40 tag 14 ncq 131072 in
Dec  6 23:14:30 OrigamiNET kernel:         res 40/00:70:20:7d:10/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: status: { DRDY }
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: cmd 60/00:78:20:7e:10/01:00:00:00:00/40 tag 15 ncq 131072 in
Dec  6 23:14:30 OrigamiNET kernel:         res 40/00:70:20:7d:10/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: status: { DRDY }
Dec  6 23:14:30 OrigamiNET kernel: ata5: hard resetting link
Dec  6 23:14:30 OrigamiNET kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: configured for UDMA/33
Dec  6 23:14:30 OrigamiNET kernel: ata5: EH complete
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: exception Emask 0x10 SAct 0x40000001 SErr 0x280100 action 0x6 frozen
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Dec  6 23:14:30 OrigamiNET kernel: ata5: SError: { UnrecovData 10B8B BadCRC }
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: cmd 60/00:00:20:81:10/01:00:00:00:00/40 tag 0 ncq 131072 in
Dec  6 23:14:30 OrigamiNET kernel:         res 40/00:f0:20:80:10/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: status: { DRDY }
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: cmd 60/00:f0:20:80:10/01:00:00:00:00/40 tag 30 ncq 131072 in
Dec  6 23:14:30 OrigamiNET kernel:         res 40/00:f0:20:80:10/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: status: { DRDY }
Dec  6 23:14:30 OrigamiNET kernel: ata5: hard resetting link
Dec  6 23:14:30 OrigamiNET kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec  6 23:14:30 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT0._GTF] (Node ffff88082f523a50), AE_NOT_FOUND (20150930/psparse-542)
Dec  6 23:14:30 OrigamiNET kernel: ata5.00: configured for UDMA/33
Dec  6 23:14:30 OrigamiNET kernel: ata5: EH complete

 

THOUGHTS:

- SATA cable was swapped and is brand new, so I'm ruling that out.

- Bad Port in the IcyDock? 

- Bad Port on the Motherboard?

- Could my RAID card be going faulty?

- I'm thinking the original Drive 5 was OK and that the Port itself might be an issue since both the new and the old drive appear to be having reading problems

- This line obviously jumps out at me:

 

Dec  6 23:14:30 OrigamiNET kernel: ata5.00: exception Emask 0x10 SAct 0x40000001 SErr 0x280100 action 0x6 frozen

Dec  6 23:14:30 OrigamiNET kernel: ata5.00: irq_stat 0x08000000, interface fatal error

 

Any help would be much appreciated!

Link to comment

Dec  6 23:14:30 OrigamiNET kernel: ata5: SError: { UnrecovData 10B8B BadCRC }

 

99% of the time these mean a bad SATA cable, but if you already replace it it can be a bad SATA port.

 

Yeah, I'm looking through the Drive Analysis doc now (https://lime-technology.com/wiki/index.php/The_Analysis_of_Drive_Issues)

Looks like a combo of Drive Interface Issues 1 and 2.

 

I'm also wondering if maybe my Power Supply is too weak.  750watts. Hmmm

 

UPDATE:

- After running through some PSU calculators, I'm good with the 750watt size.  (http://www.coolermaster.com/power-supply-calculator/)

- I'll probably replace all the SATA cables since I suspect them of being very poor quality and they don't lock.

- Research potential problems with the SUPERMICRO AOC-SASLP-MV8 controller (Is there a popular replacement that's faster / reliable?)

 

 

Link to comment

I don't know about the rest of your issues, but I replaced my AOC-SASLP-MV8 with a  AOC-SAS2LP-MV8, and was very happy with the decision.

 

I never did figure out what the deal was with my old motherboard and that card.  Different various performance issues, some that have been well documented by other forum members in other forum postings, 1 or 2 that may have been unique to me.  But when I finally gave up and replaced it with the  AOC-SAS2LP-MV8, it was like a breath of fresh air for the system. 

Link to comment

I don't know about the rest of your issues, but I replaced my AOC-SASLP-MV8 with a  AOC-SAS2LP-MV8, and was very happy with the decision.

 

I never did figure out what the deal was with my old motherboard and that card.  Different various performance issues, some that have been well documented by other forum members in other forum postings, 1 or 2 that may have been unique to me.  But when I finally gave up and replaced it with the  AOC-SAS2LP-MV8, it was like a breath of fresh air for the system.

 

Thank you for the feedback on this! I'm feeling like this is where the speed bottleneck is currently in my system.

I do wish I did some more research before having the knee jerk reaction of "Crap! Order more drives!, but hey, I needed to grow the array anyways.

 

Now to track down some good SATA cables to swap. SFF-8087 mini-SAS cables also.

Link to comment

- Research potential problems with the SUPERMICRO AOC-SASLP-MV8 controller (Is there a popular replacement that's faster / reliable?)

 

The SASLP is one of the most used controllers in unRAID, I'm not aware of any issues with it, it is however somewhat bandwidth limited, fully loaded max speed during parity check/disk rebuild is 80MB/s.

Link to comment

UPDATE:

 

- Replaced my AOC-SASLP-MV8 with a AOC-SAS2LP-MV8

- Replaced all cables

 

The issue with Drive 5 continues.

 

The drive itself appears to be fine, but it intermittently appears and disappears from the BIOS hardware listing when rebooting.

- Move the SATA cable for Drive 5 to a different port. No change and the issue continues with intermittently detecting the drive.

- Swapped drives out just to humor myself and the same behavior continues with the swapped drive.

 

At this point, I suspect that the Icy Dock bay itself is going bad. Bypassing the Icy Dock bay and doing a direct connection to the drive would confirm that.

Just seems so odd that the Icy Dock itself is starting to go bad.

 

Maybe I should just replace the entire case and forgo the Icy Dock bays all together and get something that allows for easier motherboard access for cable management.

 

Back at it again. Sigh.

Link to comment

Well that sucks :(

Looks like the Icy Dock MB455SPF-B is no longer manufactured, and the few units for sale are at super mark up prices.

 

Looks like I'm in the market for a new case that can hold 15+ drives and has good cable management.

 

Any suggestions or recommendations?

 

UPDATE:

I might just swap out the defective dock for the newer version: Icy Dock FatCage MB155SP-B

Link to comment

Well that sucks :(

Looks like the Icy Dock MB455SPF-B is no longer manufactured, and the few units for sale are at super mark up prices.

 

Looks like I'm in the market for a new case that can hold 15+ drives and has good cable management.

 

Any suggestions or recommendations?

 

UPDATE:

I might just swap out the defective dock for the newer version: Icy Dock FatCage MB155SP-B

I have been using the Icy Dock FatCage MB155SP-B without any issues.
Link to comment

Quick question - Could dust or a slightly loose connection cause this error:

 

Dec 11 13:36:00 OrigamiNET kernel: ata8.00: supports DRM functions and may not be fully accessible
Dec 11 13:36:00 OrigamiNET kernel: ata8.00: configured for UDMA/33
Dec 11 13:36:00 OrigamiNET kernel: ata8: EH complete
Dec 11 13:36:33 OrigamiNET kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x880000 action 0x6 frozen
Dec 11 13:36:33 OrigamiNET kernel: ata8: SError: { 10B8B LinkSeq }
Dec 11 13:36:33 OrigamiNET kernel: ata8.00: failed command: WRITE DMA EXT
Dec 11 13:36:33 OrigamiNET kernel: ata8.00: cmd 35/00:40:a0:a0:97/00:05:a4:00:00/e0 tag 2 dma 688128 out
Dec 11 13:36:33 OrigamiNET kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 11 13:36:33 OrigamiNET kernel: ata8.00: status: { DRDY }
Dec 11 13:36:33 OrigamiNET kernel: ata8: hard resetting link
Dec 11 13:36:34 OrigamiNET kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

 

I'm just seeing the SError: { 10B8B LinkSeq } issue now instead of the previous { UnrecovData HostInt 10B8B BadCRC }

 

UPDATE:

Attaching full Sys Log incase I'm missing something

origaminet-syslog-20161211-1355.zip

Link to comment

Thanks for taking a look John.

 

I just finished the Disk Rebuild on the drive and the array is back up and running again in protected state.

-The next step is to do a Parity Check with the "Make Corrections to Parity Drive" unchecked.

-Fix the ACPI Exception with this http://lime-technology.com/forum/index.php?topic=45920.0

 

However, I noticed in the Sys Log that the reported errors went from ATA8 to ATA7 about half way through the rebuild process.

The IOMMU is a new one on me and will have to review the provided thread and follow up with you.

 

origaminet-diagnostics-20161211-1856.zip

Link to comment

It looks as though it could be cable related, but it also looks a little like this: http://lime-technology.com/forum/index.php?topic=40683.0

 

Do you have IOMMU enabled? Post your diagnostics zip.

 

I'm not really seeing any of the errors that's mentioned in that thread, well nothing that's a solid match for my problem.

I'm running the newest firmware for the card also: 4.0.0.1812

 

UPDATE:

I take that back John, I do believe I am seeing what you're taking about with the IOMMU and Marvell chipset cards.

So far, I'm just seeing this same error going back and forth between ATA7 and ATA8. They switch out for hours at a time on being reported.

 

- Finished another extended SMARTCheck on the drive. Passed

- Parity Check completed and No Errors Found (Array is up and Protected)

 

However, I'm still seeing a few of these popping up every now and then:

Dec 12 06:14:01 OrigamiNET kernel: ata8.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
Dec 12 06:14:01 OrigamiNET kernel: ata8.00: irq_stat 0x08000002, interface fatal error
Dec 12 06:14:01 OrigamiNET kernel: ata8: SError: { UnrecovData 10B8B BadCRC }
Dec 12 06:14:01 OrigamiNET kernel: ata8.00: failed command: SMART
Dec 12 06:14:01 OrigamiNET kernel: ata8.00: cmd b0/d1:01:01:4f:c2/00:00:00:00:00/00 tag 4 pio 512 in
Dec 12 06:14:01 OrigamiNET kernel:         res 50/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Dec 12 06:14:01 OrigamiNET kernel: ata8.00: status: { DRDY }
Dec 12 06:14:01 OrigamiNET kernel: ata8: hard resetting link
Dec 12 06:14:02 OrigamiNET kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 12 06:14:02 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec 12 06:14:02 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT3._GTF] (Node ffff88082f523bb8), AE_NOT_FOUND (20150930/psparse-542)
Dec 12 06:14:02 OrigamiNET kernel: ata8.00: supports DRM functions and may not be fully accessible
Dec 12 06:14:02 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec 12 06:14:02 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT3._GTF] (Node ffff88082f523bb8), AE_NOT_FOUND (20150930/psparse-542)
Dec 12 06:14:02 OrigamiNET kernel: ata8.00: supports DRM functions and may not be fully accessible
Dec 12 06:14:02 OrigamiNET kernel: ata8.00: configured for UDMA/33
Dec 12 06:14:02 OrigamiNET kernel: ata8: EH complete

 

 

Dec 12 02:00:47 OrigamiNET kernel: ata7.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
Dec 12 02:00:47 OrigamiNET kernel: ata7.00: irq_stat 0x08000000, interface fatal error
Dec 12 02:00:47 OrigamiNET kernel: ata7: SError: { UnrecovData 10B8B BadCRC }
Dec 12 02:00:47 OrigamiNET kernel: ata7.00: failed command: READ DMA EXT
Dec 12 02:00:47 OrigamiNET kernel: ata7.00: cmd 25/00:40:00:35:33/00:05:c0:00:00/e0 tag 0 dma 688128 in
Dec 12 02:00:47 OrigamiNET kernel:         res 50/00:00:37:62:16/00:00:75:00:00/e0 Emask 0x10 (ATA bus error)
Dec 12 02:00:47 OrigamiNET kernel: ata7.00: status: { DRDY }
Dec 12 02:00:47 OrigamiNET kernel: ata7: hard resetting link
Dec 12 02:00:48 OrigamiNET kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 12 02:00:48 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec 12 02:00:48 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT2._GTF] (Node ffff88082f523b40), AE_NOT_FOUND (20150930/psparse-542)
Dec 12 02:00:49 OrigamiNET kernel: ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Dec 12 02:00:49 OrigamiNET kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT1.SPT2._GTF] (Node ffff88082f523b40), AE_NOT_FOUND (20150930/psparse-542)
Dec 12 02:00:49 OrigamiNET kernel: ata7.00: configured for UDMA/133
Dec 12 02:00:49 OrigamiNET kernel: ata7: EH complete

Link to comment

I don't know enough about the "Marvell bug" to know whether you are actually affected by it or not. An easy way to test though is to disable IOMMU (a.k.a. Intel VT-d or AMD-Vi) in your BIOS, if you have it enabled. If you're not passing through hardware devices to VMs you don't need to have it enabled. I only use a couple of simple VMs with no pass-through so I live with it disabled. It's worth a try, at any rate.

 

EDIT: I downloaded your diagnostics and then got distracted before I could take a look. The distraction took all day, unfortunately! Now I'd had a chance to look, I see that you do indeed have IOMMU enabled. If you can disable it in the BIOS (you might have to tell VMs not to auto-start, first) and then see if the errors in your syslog go away you'll be able to confirm one way or the other.

Link to comment

Well I'll be damned! No errors since the change!

 

It's been about 9 hours with nothing weird at all! I'm heading to bed to give it some more time before declaring all good, but I'm just surprised that it was just that single BIOS adjustment to fix this!

 

Thanks for the second set of eyes on this John! Much appreciated.

 

Link to comment

Thanks for trying that, Dave. The question now is, can you live without IOMMU or is that an inconvenience to you because you want to run more sophisticated VMs and need to pass through hardware devices? If you do need IOMMU then there's a workaround mentioned in that thread, which may or may not work. If it doesn't work then the only solution is to use a different SAS controller, which would be annoying for you since I know you only just bought your current one. Personally, I have no need for pass-through - if I need a computer for a particular purpose I build one of the appropriate spec. I use only very simple VMs so the bug isn't a real problem for me. Now that it's stable, please use your server as you would expect to and report back if there are any issues.

 

Link to comment

Thanks for trying that, Dave. The question now is, can you live without IOMMU or is that an inconvenience to you because you want to run more sophisticated VMs and need to pass through hardware devices? If you do need IOMMU then there's a workaround mentioned in that thread, which may or may not work. If it doesn't work then the only solution is to use a different SAS controller, which would be annoying for you since I know you only just bought your current one. Personally, I have no need for pass-through - if I need a computer for a particular purpose I build one of the appropriate spec. I use only very simple VMs so the bug isn't a real problem for me. Now that it's stable, please use your server as you would expect to and report back if there are any issues.

 

Hi John! Looks like I'm still good to go. No errors at all.

As for the IOMMU, I don't currently have a need for it as this server is purely running the Dockers I have listed in my sig. However, if unRAID gave me the ability to setup a HyperSpin Docker instance, I could see the potential need for hardware pass-through for game system emulators, and reading the USB slots for Bluetooth dongles and game controllers.

 

Thanks again!

Link to comment

And I spoke too soon... sigh.

 

After some extensive hardware testing, I've confirmed that the backplate is the issue here as it's having problems reading the drive from time to time.

Bypassing the backplate confirms this issue. Unfortunately, the Icy Dock HDD cages are a pain in the ass to track down right now... some sort of shortage it looks like as the pricing is very inflated on the MB155SP-B.

 

In the mean time, is it OK to plug the drive into an external SATA enclosure so I can keep the array protected until I can find a replacement HDD cage?

Basically putting the drive back into the array using an external SATA enclosure since the bay that the drive was in is bad.

 

 

Thanks!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...