Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Faulty Cable?

Featured Replies

Hi Guys,

 

From the Wiki (somewhere, I can't find it now), I think I have a faulty power cable or SATA cable but would like some confirmation before wriggling / replacing things.

 

I am currently running unraid 4.4.2 on a full slackware distribution. Up until recently, I have had no real issues until my newest drive started showing errors. Unraid has marked the drive with a red circle.

 

Now, I've run short and long S.M.A.R.T. tests several times, and there are 0 issues. So, I pressed the restore button, did a parity sync and all was fine for a few days.

 

It was after this I noticed that some of my files may have disappeared and I didn't think it was PEBKAC.

 

A few days later, the same issue again. So, I went in the same circle again - and lost some data - again.

 

 

To make it easier, I'll post some stats:

 

Drive:

1TB - ata-ST31000528AS_6VP1PBAY (Disk 3)

 

Smart Report:

 

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Device Model:    ST31000528AS

Serial Number:    6VP1PBAY

Firmware Version: CC37

User Capacity:    1,000,204,886,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri Feb 19 04:17:18 2010 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 245) Self-test routine in progress...

50% of test remaining.

Total time to complete Offline

data collection: ( 600) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  1) minutes.

Extended self-test routine

recommended polling time: ( 180) minutes.

Conveyance self-test routine

recommended polling time: (  2) minutes.

SCT capabilities:       (0x103f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  119  099  006    Pre-fail  Always      -      226697282

  3 Spin_Up_Time            0x0003  097  095  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      361

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  066  060  030    Pre-fail  Always      -      4873256

  9 Power_On_Hours          0x0032  097  097  000    Old_age  Always      -      3028

10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      164

183 Unknown_Attribute      0x0032  099  099  000    Old_age  Always      -      1

184 Unknown_Attribute      0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

188 Unknown_Attribute      0x0032  100  099  000    Old_age  Always      -      100

189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0

190 Airflow_Temperature_Cel 0x0022  071  059  045    Old_age  Always      -      29 (Lifetime Min/Max 27/29)

194 Temperature_Celsius    0x0022  029  041  000    Old_age  Always      -      29 (0 19 0 0)

195 Hardware_ECC_Recovered  0x001a  037  023  000    Old_age  Always      -      226697282

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0

240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      144976621079867

241 Unknown_Attribute      0x0000  100  253  000    Old_age  Offline      -      3589836420

242 Unknown_Attribute      0x0000  100  253  000    Old_age  Offline      -      630453533

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Self-test routine in progress 50%      3028        -

# 2  Short offline      Completed without error      00%      2801        -

# 3  Extended offline    Completed without error      00%      2711        -

# 4  Short offline      Completed without error      00%      2699        -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

(This seems to be a perfect drive?)

 

 

 

/var/log/messages:

 

Feb 14 07:49:06 TANK kernel: ata9: hard resetting link

Feb 14 07:49:08 TANK kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

 

Feb 15 04:47:41 TANK kernel:  sdk:md: disk3 read error

Feb 15 04:47:42 TANK kernel: pe read error: 1205131344/3, count: 1

Feb 15 04:47:43 TANK kernel: pe read error: 1205139208/3, count: 1

Feb 15 04:47:43 TANK kernel: <4pe read error: 1205139216/3, count: 1

Feb 15 04:47:43 TANK kernel: <4pe read error: 1205139248/3, count: 1

Feb 15 04:47:43 TANK kernel: <pe read error: 1205139256/3, count: 1

Feb 15 04:47:43 TANK kernel: pe read error: 1205139264/3, count: 1

 

Feb 17 22:27:02 TANK kernel: scsi 9:0:0:0: Direct-Access    ATA      ST31000528AS    CC37 PQ: 0 ANSI: 5

Feb 17 22:27:02 TANK kernel: sd 9:0:0:0: [sdi] 1953525168 512-byte hardware sectors (1000205 MB)

Feb 17 22:27:02 TANK kernel: sd 9:0:0:0: [sdi] Write Protect is off

Feb 17 22:27:02 TANK kernel: sd 9:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 

 

 

 

 

Hi Guys,

 

From the Wiki (somewhere, I can't find it now), I think I have a faulty power cable or SATA cable but would like some confirmation before wriggling / replacing things.

 

I am currently running unraid 4.4.2 on a full slackware distribution. Up until recently, I have had no real issues until my newest drive started showing errors. Unraid has marked the drive with a red circle.

 

Now, I've run short and long S.M.A.R.T. tests several times, and there are 0 issues. So, I pressed the restore button, did a parity sync and all was fine for a few days.

 

It was after this I noticed that some of my files may have disappeared and I didn't think it was PEBKAC.

 

A few days later, the same issue again. So, I went in the same circle again - and lost some data - again.

 

 

To make it easier, I'll post some stats:

 

Drive:

1TB - ata-ST31000528AS_6VP1PBAY (Disk 3)

 

Smart Report:

 

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Device Model:     ST31000528AS

Serial Number:    6VP1PBAY

Firmware Version: CC37

User Capacity:    1,000,204,886,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:   8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri Feb 19 04:17:18 2010 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 245) Self-test routine in progress...

50% of test remaining.

Total time to complete Offline

data collection: ( 600) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (   1) minutes.

Extended self-test routine

recommended polling time: ( 180) minutes.

Conveyance self-test routine

recommended polling time: (   2) minutes.

SCT capabilities:        (0x103f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -       226697282

  3 Spin_Up_Time            0x0003   097   095   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       361

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   066   060   030    Pre-fail  Always       -       4873256

  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       3028

10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0

12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       164

183 Unknown_Attribute       0x0032   099   099   000    Old_age   Always       -       1

184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

188 Unknown_Attribute       0x0032   100   099   000    Old_age   Always       -       100

189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   071   059   045    Old_age   Always       -       29 (Lifetime Min/Max 27/29)

194 Temperature_Celsius     0x0022   029   041   000    Old_age   Always       -       29 (0 19 0 0)

195 Hardware_ECC_Recovered  0x001a   037   023   000    Old_age   Always       -       226697282

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       144976621079867

241 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       3589836420

242 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       630453533

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Self-test routine in progress 50%      3028         -

# 2  Short offline       Completed without error       00%      2801         -

# 3  Extended offline    Completed without error       00%      2711         -

# 4  Short offline       Completed without error       00%      2699         -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

(This seems to be a perfect drive?)

 

 

 

/var/log/messages:

 

Feb 14 07:49:06 TANK kernel: ata9: hard resetting link

Feb 14 07:49:08 TANK kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

 

Feb 15 04:47:41 TANK kernel:  sdk:md: disk3 read error

Feb 15 04:47:42 TANK kernel: pe read error: 1205131344/3, count: 1

Feb 15 04:47:43 TANK kernel: pe read error: 1205139208/3, count: 1

Feb 15 04:47:43 TANK kernel: <4pe read error: 1205139216/3, count: 1

Feb 15 04:47:43 TANK kernel: <4pe read error: 1205139248/3, count: 1

Feb 15 04:47:43 TANK kernel: <pe read error: 1205139256/3, count: 1

Feb 15 04:47:43 TANK kernel: pe read error: 1205139264/3, count: 1

 

Feb 17 22:27:02 TANK kernel: scsi 9:0:0:0: Direct-Access     ATA      ST31000528AS     CC37 PQ: 0 ANSI: 5

Feb 17 22:27:02 TANK kernel: sd 9:0:0:0: [sdi] 1953525168 512-byte hardware sectors (1000205 MB)

Feb 17 22:27:02 TANK kernel: sd 9:0:0:0: [sdi] Write Protect is off

Feb 17 22:27:02 TANK kernel: sd 9:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

I had to look up "PEBKAC"  ;D

 

There are two issues here... (three actually, but we'll get to the third)

 

1.  For a disk to be taken off-line, a "write" to it failed.    Typically, this is a hardware issue... It could be the disk, or a loose connector (SATA or Power) or a intermittent backplane, or drive tray, or even a flaky drive controller.

 

2. For files to disappear, they were either removed by somebody ... or more likely ... you have a corrupt file-system which needs repair.

 

To determine the reason the drive was taken off-line we would need to see a copy of the syslog from after the failure occurs but BEFORE you next rebooted.  (It might not be too late, depends on if you've rebooted since the initial failure or not)

So... post a copy of your syslog ... attach it to your next post.

 

If the physical on-disk file-system suffered some corruption when the "write" to the drive failed it would have still written the correct parity information.  You might have been able to un-do the corruption by rebuilding the data on the failed drive after possibly re-seating the connectors, etc.  Instead, you elected to throw away the existing parity, set a new drive configuration, and rebuild parity from the data drives (including the possibly corrupt file-system on the disk where the "write" error occurred.)

 

So... from now on, unless explicitly advised by an experienced member of this forum to press the button labeled "restore" don't.  Pressing it is PEBKAC in most cases.  Do not press it unless it is part of the "trust-my-parity" procedure as described in the wiki or you are removing a disk from the array and will not replace it...

 

If your disk that had the "red" icon had actually failed, by pressing the button labeled "restore" you would have erased its prior contents from parity and there is no way to get it back.  It does not restore data, but sets a initial disk configuration.

 

Now, hopefully is it just an intermittent connection... and you have some file-system corruption.  As you use the corrupted file-system, it is possible for it to lose track of files.

 

Once you stop the array, power down, and verify the cables for tightness, power back up and use the procedure described here in the wiki to check the file system on the disk.  Odds are good it will need repair. 

  http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems

 

Oh yes, read here about the evils of the button labeled "Restore"

http://lime-technology.com/forum/index.php?topic=1833.msg12918#msg12918

 

In the future, always use the button labeled as "Start" to start the array.  If the drive goes off-line again, use the procedure described here: http://lime-technology.com/wiki/index.php?title=FAQ#How_do_I_recover_from_a_hard_disk_failure.3F

  • Author

Yep, PEBKAC it is then :-)

 

 

I think your right on the data rebuild process. Basically, I had moved files (which would have gone to disk3 as it had the most free-space), found an error, pressed restore (even though it said disk contents are not affected), and did a parity sync.

 

Lesson learned.

 

 

I'll check the cables out on the weekend.

 

 

As for the error message(s), dmesg said it was unable to identify the interface. This is the same message in the Wiki:

 

"ata7: hard resetting link

ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

ata7.00: qc timeout (cmd 0xec)

ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)

ata7.00: revalidation failed (errno=-5)

ata7: failed to recover some devices, retrying in 5 secs"

 

 

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.