Hard Resetting Link Errors


Recommended Posts

I'm running 5.0-beta14. Yes, I know, I need to get off beta. I had 4 drives fail over the last month and have been waiting until I get everything up and running correctly again before upgrading.

 

I've attached a full syslog but I've pasted the section with the errors below.

 

Dec 18 18:12:49 Filebox kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:12:49 Filebox kernel: ata9.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:12:49 Filebox kernel: ata9.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:12:49 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:12:49 Filebox kernel: ata9.00: status: { DRDY } (Drive related)
Dec 18 18:12:49 Filebox kernel: ata9: hard resetting link (Minor Issues)
Dec 18 18:12:50 Filebox kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Dec 18 18:12:50 Filebox kernel: ata9.00: configured for UDMA/133 (Drive related)
Dec 18 18:12:50 Filebox kernel: ata9: EH complete (Drive related)
Dec 18 18:12:50 Filebox emhttp: shcmd (70): mkdir /mnt/user (Other emhttp)
Dec 18 18:12:50 Filebox emhttp: shcmd (71): /usr/local/sbin/shfs /mnt/user -disks 1022 -o noatime,big_writes,allow_other,default_permissions,use_ino  (Other emhttp)
Dec 18 18:12:50 Filebox emhttp: shcmd (72): crontab -c /etc/cron.d -d $stuff$> /dev/null (Other emhttp)
Dec 18 18:12:50 Filebox emhttp: shcmd (73): /usr/local/sbin/emhttp_event disks_mounted (Other emhttp)
Dec 18 18:12:50 Filebox emhttp_event: disks_mounted (Other emhttp)
Dec 18 18:12:50 Filebox emhttp: shcmd (74): :>/etc/samba/smb-shares.conf (Other emhttp)
Dec 18 18:12:50 Filebox emhttp: Restart SMB... (Other emhttp)
Dec 18 18:12:50 Filebox emhttp: shcmd (75): killall -HUP smbd (Minor Issues)
Dec 18 18:12:50 Filebox emhttp: shcmd (76): ps axc | grep -q rpc.mountd (Other emhttp)
Dec 18 18:12:50 Filebox emhttp: _shcmd: shcmd (76): exit status: 1 (Other emhttp)
Dec 18 18:12:50 Filebox emhttp: shcmd (77): /usr/local/sbin/emhttp_event svcs_restarted (Other emhttp)
Dec 18 18:12:50 Filebox emhttp_event: svcs_restarted (Other emhttp)
Dec 18 18:13:10 Filebox kernel: ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:13:10 Filebox kernel: ata14.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:13:10 Filebox kernel: ata14.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:13:10 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:13:10 Filebox kernel: ata14.00: status: { DRDY } (Drive related)
Dec 18 18:13:10 Filebox kernel: ata14: hard resetting link (Minor Issues)
Dec 18 18:13:11 Filebox kernel: ata14: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Dec 18 18:13:11 Filebox kernel: ata14.00: configured for UDMA/133 (Drive related)
Dec 18 18:13:11 Filebox kernel: ata14: EH complete (Drive related)
Dec 18 18:13:33 Filebox kernel: ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:13:33 Filebox kernel: ata14.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:13:33 Filebox kernel: ata14.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:13:33 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:13:33 Filebox kernel: ata14.00: status: { DRDY } (Drive related)
Dec 18 18:13:33 Filebox kernel: ata14: hard resetting link (Minor Issues)
Dec 18 18:13:33 Filebox kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:13:33 Filebox kernel: ata10.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:13:33 Filebox kernel: ata10.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:13:33 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:13:33 Filebox kernel: ata10.00: status: { DRDY } (Drive related)
Dec 18 18:13:33 Filebox kernel: ata10: hard resetting link (Minor Issues)
Dec 18 18:13:33 Filebox kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Dec 18 18:13:33 Filebox kernel: ata14: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Dec 18 18:13:33 Filebox kernel: ata10.00: supports DRM functions and may not be fully accessible (Drive related)
Dec 18 18:13:33 Filebox kernel: ata14.00: configured for UDMA/133 (Drive related)
Dec 18 18:13:33 Filebox kernel: ata14: EH complete (Drive related)
Dec 18 18:13:33 Filebox kernel: ata10.00: supports DRM functions and may not be fully accessible (Drive related)
Dec 18 18:13:33 Filebox kernel: ata10.00: configured for UDMA/133 (Drive related)
Dec 18 18:13:33 Filebox kernel: ata10: EH complete (Drive related)
Dec 18 18:13:54 Filebox kernel: ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:13:54 Filebox kernel: ata13.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:13:54 Filebox kernel: ata13.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:13:54 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:13:54 Filebox kernel: ata13.00: status: { DRDY } (Drive related)
Dec 18 18:13:54 Filebox kernel: ata13: hard resetting link (Minor Issues)
Dec 18 18:13:54 Filebox kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:13:54 Filebox kernel: ata9.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:13:54 Filebox kernel: ata9.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:13:54 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:13:54 Filebox kernel: ata9.00: status: { DRDY } (Drive related)
Dec 18 18:13:54 Filebox kernel: ata9: hard resetting link (Minor Issues)
Dec 18 18:13:54 Filebox kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Dec 18 18:13:54 Filebox kernel: ata13: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Dec 18 18:13:54 Filebox kernel: ata9.00: configured for UDMA/133 (Drive related)
Dec 18 18:13:54 Filebox kernel: ata9: EH complete (Drive related)
Dec 18 18:13:54 Filebox kernel: ata13.00: configured for UDMA/133 (Drive related)
Dec 18 18:13:54 Filebox kernel: ata13: EH complete (Drive related)
Dec 18 18:14:23 Filebox kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:14:23 Filebox kernel: ata9.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:14:23 Filebox kernel: ata9.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:14:23 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:14:23 Filebox kernel: ata9.00: status: { DRDY } (Drive related)
Dec 18 18:14:23 Filebox kernel: ata9: hard resetting link (Minor Issues)
Dec 18 18:14:23 Filebox kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Dec 18 18:14:23 Filebox kernel: ata9.00: configured for UDMA/133 (Drive related)
Dec 18 18:14:23 Filebox kernel: ata9: EH complete (Drive related)
Dec 18 18:14:51 Filebox kernel: ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:14:51 Filebox kernel: ata13.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:14:51 Filebox kernel: ata13.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:14:51 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:14:51 Filebox kernel: ata13.00: status: { DRDY } (Drive related)
Dec 18 18:14:51 Filebox kernel: ata13: hard resetting link (Minor Issues)
Dec 18 18:14:51 Filebox kernel: ata13: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Dec 18 18:14:51 Filebox kernel: ata13.00: configured for UDMA/133 (Drive related)
Dec 18 18:14:51 Filebox kernel: ata13: EH complete (Drive related)
Dec 18 18:15:09 Filebox kernel: mdcmd (47): nocheck  (unRAID engine)
Dec 18 18:15:35 Filebox kernel: ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:15:35 Filebox kernel: ata14.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:15:35 Filebox kernel: ata14.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:15:35 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:15:35 Filebox kernel: ata14.00: status: { DRDY } (Drive related)
Dec 18 18:15:35 Filebox kernel: ata14: hard resetting link (Minor Issues)
Dec 18 18:15:35 Filebox kernel: ata14: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Dec 18 18:15:35 Filebox kernel: ata14.00: configured for UDMA/133 (Drive related)
Dec 18 18:15:35 Filebox kernel: ata14: EH complete (Drive related)
Dec 18 18:15:36 Filebox kernel: md: md_do_sync: got signal, exit... (unRAID engine)
Dec 18 18:15:36 Filebox kernel: md: recovery thread sync completion status: -4 (unRAID engine)
Dec 18 18:16:39 Filebox kernel: mdcmd (48): clear  (unRAID engine)
Dec 18 18:17:08 Filebox kernel: ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)
Dec 18 18:17:08 Filebox kernel: ata13.00: failed command: IDENTIFY DEVICE (Minor Issues)
Dec 18 18:17:08 Filebox kernel: ata13.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Dec 18 18:17:08 Filebox kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)
Dec 18 18:17:08 Filebox kernel: ata13.00: status: { DRDY } (Drive related)
Dec 18 18:17:08 Filebox kernel: ata13: hard resetting link (Minor Issues)
Dec 18 18:17:09 Filebox kernel: ata13: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Dec 18 18:17:09 Filebox kernel: ata13.00: configured for UDMA/133 (Drive related)
Dec 18 18:17:09 Filebox kernel: ata13: EH complete (Drive related)

 

I've been unable to figure out how to map the designations like "ata13.00" to a specific drive, which has made troubleshooting hard.

 

I recently replaced two PCI sata raid cards with two PCI-E cards (Model: SI-PEX40057  Chipset: 88SE9315). I replaced them because I figured the PCI-E cards would be faster than the PCI cards. I don't know if the new cards are related to the problem or not.

 

Parity calc has been impossibly slow as well (~5MB/s). I was thinking this was because of the above errors but I don't know for sure.

 

Any suggestions as to the problem or troubleshooting steps would be appreciated.

syslog-2014-12-18.txt

Link to comment

Looks like compatibility issues, with both cards.  The 4 drives failing to identify are the 4 Seagates, attached 2 per card.  They appeared to work initially, but almost as soon as the parity sync began, the errors began, both cards, all 4 drives.

 

According to this syslog, you started the array with a new parity drive at 18:12:22, 1st error appeared at 18:12:49, 2nd at 18:13:10, 3rd at 18:13:33, and 4th at 18:13:54, all about 20 seconds apart, don't know what to make of that.  At 18:15:09, the parity build was canceled.  2 minutes later, the syslog ends, with one more error.

 

I don't know this card.  There are cases where a card works well with UnRAID, but not in multiples.  That is, one card will work, but is not designed to allow 2 of the same card to work together without conflicts.  You might try removing one.

 

There are times to refrain from upgrading until the system is clean, but there are other times when the upgrade may fix the problem.  There's a chance that a later UnRAID release will have better support for your hardware.  I strongly recommend you upgrade your flash drive to the latest, currently v5.0.6, and see if that works better.  If not, then it's possible that the latest v6 will have better support for your cards.  Each later release has a more current Linux kernel, with constantly improving hardware support.

Link to comment

Thanks for the suggestions RobJ.

 

Ok, so I upgraded to 5.0.6. Unfortunately, I'm still getting the same errors as before in the syslog. So I removed one of the SYBA SI-PEX40064 cards, plugging the drives that were in that card into the remaining one. So previously there were two cards with two drives each, now there is one card with four drives. But again, still getting the same errors.

 

I've seen posts from several users that use this card without issue. It's even listed as a "quality SATA controller" on the Designing an UnRaid Server wiki page (http://lime-technology.com/wiki/index.php/Designing_an_unRAID_server)

 

Obviously the problems are related to the card in some way since the errors only occur with drives connected to it. Does anyone have any other suggestions? Could it be a compatibility issue with my motherboard?

 

-Thanks

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.