Bad disk or bad cable?


Recommended Posts

I am trying to diagnose the following error in my log, just can't tell whether it's a disk or bad cable. New SFF-8087 cables are coming, however, I don't have a spare HD to test the other possibility.

 

It's very sporadic, but unRAID boots up and cannot find the disk in question - sometimes it does, sometimes is doesn't with this error:

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: ATA-8: ST31000528AS, CC38, max UDMA/133

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: qc timeout (cmd 0xef)

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: failed to set xfermode (err_mask=0x4)

Nov  5 14:38:45 Clara-Belle kernel: drivers/scsi/mvsas/mv_sas.c 1522:mvs_I_T_nexus_reset for device[3]:rc= 0

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: revalidation failed (errno=-5)

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: qc timeout (cmd 0xec)

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: revalidation failed (errno=-5)

Nov  5 14:38:45 Clara-Belle kernel: ata4.00: disabled

Nov  5 14:38:45 Clara-Belle kernel: ata4: hard resetting link

 

Smart values are high for Raw_Read_Error_Rate, Seek_Error_Rate, Command_Timeout and Hardware_ECC_Recovered. Please attached.

 

Running rc5a

syslog-20121105-144439.txt.zip

smart_output.txt

Link to comment

I beginning to think that this is not a cable issue, because I tried another port from the second 8087 cable (cause I had one free, using 7 of the 8 ports) and I still experience the timeout problem with the same drive (I have two SFF80887 going from the SASLP-MV8 to the blackplane).

 

These are the cables in question: http://www.ebay.com/itm/110931840838?ssPageName=STRK:MEWNX:IT&_trksid=p3984.m1439.l2649

 

Another thing I tend to notice at times is that the activity leds on the drives connected to the MV8 tend to stay solid for rather long periods of time when nothing apparent seems to be accessing them.

 

I know the MV8 + and backplace are fine, because it was working fine prior to virtualizing unRAID (I documented the issue in greater detail here: http://lime-technology.com/forum/index.php?topic=23417.msg206539#msg206539).

 

The only thing that I could think of is the drive (which to me and others look fine from the SMART readings) or some issue due to the fact that I have passed through the MV8 on my ESXi box. I still contend that the "Disabling IRQ #16" message I receive when I powerdown the vm is very odd. Waiting on black friday to pick up another drive...

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.