Bad disk or bad cable?

joelones · November 5, 2012

I am trying to diagnose the following error in my log, just can't tell whether it's a disk or bad cable. New SFF-8087 cables are coming, however, I don't have a spare HD to test the other possibility.

It's very sporadic, but unRAID boots up and cannot find the disk in question - sometimes it does, sometimes is doesn't with this error:

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: ATA-8: ST31000528AS, CC38, max UDMA/133

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: qc timeout (cmd 0xef)

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: failed to set xfermode (err_mask=0x4)

Nov 5 14:38:45 Clara-Belle kernel: drivers/scsi/mvsas/mv_sas.c 1522:mvs_I_T_nexus_reset for device[3]:rc= 0

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: revalidation failed (errno=-5)

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: qc timeout (cmd 0xec)

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: revalidation failed (errno=-5)

Nov 5 14:38:45 Clara-Belle kernel: ata4.00: disabled

Nov 5 14:38:45 Clara-Belle kernel: ata4: hard resetting link

Smart values are high for Raw_Read_Error_Rate, Seek_Error_Rate, Command_Timeout and Hardware_ECC_Recovered. Please attached.

Running rc5a

syslog-20121105-144439.txt.zip

smart_output.txt

dgaschk · November 5, 2012

SMART report looks fine. Try a new cable.

bcbgboy13 · November 5, 2012

This hard disk has been run in the past with elevated temperature (up to 53 deg C).

Check your cooling.

joelones · November 8, 2012

I beginning to think that this is not a cable issue, because I tried another port from the second 8087 cable (cause I had one free, using 7 of the 8 ports) and I still experience the timeout problem with the same drive (I have two SFF80887 going from the SASLP-MV8 to the blackplane).

These are the cables in question: http://www.ebay.com/itm/110931840838?ssPageName=STRK:MEWNX:IT&_trksid=p3984.m1439.l2649

Another thing I tend to notice at times is that the activity leds on the drives connected to the MV8 tend to stay solid for rather long periods of time when nothing apparent seems to be accessing them.

I know the MV8 + and backplace are fine, because it was working fine prior to virtualizing unRAID (I documented the issue in greater detail here: http://lime-technology.com/forum/index.php?topic=23417.msg206539#msg206539).

The only thing that I could think of is the drive (which to me and others look fine from the SMART readings) or some issue due to the fact that I have passed through the MV8 on my ESXi box. I still contend that the "Disabling IRQ #16" message I receive when I powerdown the vm is very odd. Waiting on black friday to pick up another drive...

Bad disk or bad cable?

Recommended Posts

joelones

Link to comment

dgaschk

Link to comment

bcbgboy13

Link to comment

joelones

Link to comment

Join the conversation