Failed command & Hard Resetting Link - What is happening? - General Support

February 18, 20179 yr

Hi,

I'm getting a lot of Failed command & Hard Resetting Link errors during a parity rebuild. This is happening across the motherboard SATA and SATA card. Faulty Sata cable or power supply? I did have a SATA cable die on me this morning...

System: Using 2x SI-PEX40064 SATA Cards and Sandybridge Motherboard

Feb 17 18:16:12 kernel: virbr0: port 1(virbr0-nic) entered disabled state

Feb 17 18:17:22 kernel: ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 17 18:17:22 kernel: ata14.00: failed command: IDENTIFY DEVICE

Feb 17 18:17:22 kernel: ata14.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 3 pio 512 in

Feb 17 18:17:22 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Feb 17 18:17:22 kernel: ata14.00: status: { DRDY }

Feb 17 18:17:22 kernel: ata14: hard resetting link

Feb 17 18:17:23 kernel: ata14: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Feb 17 18:17:23 kernel: ata14.00: configured for UDMA/133

Feb 17 18:17:23 kernel: ata14: EH complete

Feb 17 18:17:44 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 17 18:17:44 kernel: ata7.00: failed command: IDENTIFY DEVICE

Feb 17 18:17:44 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 13 pio 512 in

Feb 17 18:17:44 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Feb 17 18:17:44 kernel: ata7.00: status: { DRDY }

Feb 17 18:17:44 kernel: ata7: hard resetting link

Feb 17 18:17:45 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Feb 17 18:17:45 kernel: ata7.00: configured for UDMA/133

Feb 17 18:17:45 kernel: ata7: EH complete

Feb 17 18:18:07 kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 17 18:18:07 kernel: ata11.00: failed command: SMART

Feb 17 18:18:07 kernel: ata11.00: cmd b0/d1:01:01:4f:c2/00:00:00:00:00/00 tag 16 pio 512 in

Feb 17 18:18:07 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Feb 17 18:18:07 kernel: ata11.00: status: { DRDY }

Feb 17 18:18:07 kernel: ata11: hard resetting link

Feb 17 18:18:08 kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Feb 17 18:18:08 kernel: ata11.00: configured for UDMA/133

Feb 17 18:18:08 kernel: ata11: EH complete

Thanks!

Quote

February 18, 20179 yr

SATA cable problems. Three different ones in that syslog snippet alone, though there isn't enough information to identify which. Check (or, preferably replace) them all.

You mention your power supply too - do you have doubts about it? If so consider replacing it. Faulty power supplies cause all sorts of obscure problems. Pretty much essential for unRAID is a quality one with a single (and therefore pretty high current) +12 V rail of adequate capacity.

Quote

February 18, 20179 yr

Author

SATA cable problems. Three different ones in that syslog snippet alone, though there isn't enough information to identify which. Check (or, preferably replace) them all.

You mention your power supply too - do you have doubts about it? If so consider replacing it. Faulty power supplies cause all sorts of obscure problems. Pretty much essential for unRAID is a quality one with a single (and therefore pretty high current) +12 V rail of adequate capacity.

Hey John,

I did a bit more research and it appears if I plug the drives directly to the board then everything works fine (same SATA cable / power cable). If I plug them directly into the PCI cards (SI-PEX40064), I periodically get these SMART and Identify device errors. Could it be because the drives are WD Green (slow to spin up) or the PCI card is just slow?

Quote

February 18, 20179 yr

That card is not ideal because it connects up to four disks to a single PCIe lane but in normal use you shouldn't really notice the difference, except during a parity check. So use the motherboard ports up first and put your parity and cache disks on the motherboard. The error messages are about the controller failing to communicate with the disks and resetting the SATA link so the first thing to look at is the cables. However, your card is susceptible to a somewhat obscure bug as it's based on a Marvell chip but without further information about your system I can't tell whether it's affected. Post your diagnostics (Tools -> Diagnostics).

Quote

February 18, 20179 yr

Author

That card is not ideal because it connects up to four disks to a single PCIe lane but in normal use you shouldn't really notice the difference, except during a parity check. So use the motherboard ports up first and put your parity and cache disks on the motherboard. The error messages are about the controller failing to communicate with the disks and resetting the SATA link so the first thing to look at is the cables. However, your card is susceptible to a somewhat obscure bug as it's based on a Marvell chip but without further information about your system I can't tell whether it's affected. Post your diagnostics (Tools -> Diagnostics).

I went and bought another PEX40064 from the store instead of eBay and I'm not running into the same issue. My guess is the ebay cards are either fakes or same batch with different firmware. I'll keep an eye on things for the next 48 hours but I haven't seen the same timeout errors as the previous two cards from ebay.

Quote

Failed command & Hard Resetting Link - What is happening?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)