boltbell19 Posted February 18, 2017 Share Posted February 18, 2017 Hi, I'm getting a lot of Failed command & Hard Resetting Link errors during a parity rebuild. This is happening across the motherboard SATA and SATA card. Faulty Sata cable or power supply? I did have a SATA cable die on me this morning... System: Using 2x SI-PEX40064 SATA Cards and Sandybridge Motherboard Feb 17 18:16:12 kernel: virbr0: port 1(virbr0-nic) entered disabled state Feb 17 18:17:22 kernel: ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 17 18:17:22 kernel: ata14.00: failed command: IDENTIFY DEVICE Feb 17 18:17:22 kernel: ata14.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 3 pio 512 in Feb 17 18:17:22 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 17 18:17:22 kernel: ata14.00: status: { DRDY } Feb 17 18:17:22 kernel: ata14: hard resetting link Feb 17 18:17:23 kernel: ata14: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 17 18:17:23 kernel: ata14.00: configured for UDMA/133 Feb 17 18:17:23 kernel: ata14: EH complete Feb 17 18:17:44 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 17 18:17:44 kernel: ata7.00: failed command: IDENTIFY DEVICE Feb 17 18:17:44 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 13 pio 512 in Feb 17 18:17:44 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 17 18:17:44 kernel: ata7.00: status: { DRDY } Feb 17 18:17:44 kernel: ata7: hard resetting link Feb 17 18:17:45 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 17 18:17:45 kernel: ata7.00: configured for UDMA/133 Feb 17 18:17:45 kernel: ata7: EH complete Feb 17 18:18:07 kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 17 18:18:07 kernel: ata11.00: failed command: SMART Feb 17 18:18:07 kernel: ata11.00: cmd b0/d1:01:01:4f:c2/00:00:00:00:00/00 tag 16 pio 512 in Feb 17 18:18:07 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 17 18:18:07 kernel: ata11.00: status: { DRDY } Feb 17 18:18:07 kernel: ata11: hard resetting link Feb 17 18:18:08 kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 17 18:18:08 kernel: ata11.00: configured for UDMA/133 Feb 17 18:18:08 kernel: ata11: EH complete Thanks! Quote Link to comment
John_M Posted February 18, 2017 Share Posted February 18, 2017 SATA cable problems. Three different ones in that syslog snippet alone, though there isn't enough information to identify which. Check (or, preferably replace) them all. You mention your power supply too - do you have doubts about it? If so consider replacing it. Faulty power supplies cause all sorts of obscure problems. Pretty much essential for unRAID is a quality one with a single (and therefore pretty high current) +12 V rail of adequate capacity. Quote Link to comment
boltbell19 Posted February 18, 2017 Author Share Posted February 18, 2017 SATA cable problems. Three different ones in that syslog snippet alone, though there isn't enough information to identify which. Check (or, preferably replace) them all. You mention your power supply too - do you have doubts about it? If so consider replacing it. Faulty power supplies cause all sorts of obscure problems. Pretty much essential for unRAID is a quality one with a single (and therefore pretty high current) +12 V rail of adequate capacity. Hey John, I did a bit more research and it appears if I plug the drives directly to the board then everything works fine (same SATA cable / power cable). If I plug them directly into the PCI cards (SI-PEX40064), I periodically get these SMART and Identify device errors. Could it be because the drives are WD Green (slow to spin up) or the PCI card is just slow? Quote Link to comment
John_M Posted February 18, 2017 Share Posted February 18, 2017 That card is not ideal because it connects up to four disks to a single PCIe lane but in normal use you shouldn't really notice the difference, except during a parity check. So use the motherboard ports up first and put your parity and cache disks on the motherboard. The error messages are about the controller failing to communicate with the disks and resetting the SATA link so the first thing to look at is the cables. However, your card is susceptible to a somewhat obscure bug as it's based on a Marvell chip but without further information about your system I can't tell whether it's affected. Post your diagnostics (Tools -> Diagnostics). Quote Link to comment
boltbell19 Posted February 18, 2017 Author Share Posted February 18, 2017 That card is not ideal because it connects up to four disks to a single PCIe lane but in normal use you shouldn't really notice the difference, except during a parity check. So use the motherboard ports up first and put your parity and cache disks on the motherboard. The error messages are about the controller failing to communicate with the disks and resetting the SATA link so the first thing to look at is the cables. However, your card is susceptible to a somewhat obscure bug as it's based on a Marvell chip but without further information about your system I can't tell whether it's affected. Post your diagnostics (Tools -> Diagnostics). I went and bought another PEX40064 from the store instead of eBay and I'm not running into the same issue. My guess is the ebay cards are either fakes or same batch with different firmware. I'll keep an eye on things for the next 48 hours but I haven't seen the same timeout errors as the previous two cards from ebay. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.