Jump to content

Parity Drive suddenly errors and stops


Recommended Posts

Hi there.

 

I recently purchased the pro version to add up more drives to the array and ran 3 HDDs + parity just fine. Suddenly the parity tuned red and got diabled. I attached the syslog.

Here is the smart status, I ran a couple of tests:

 

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Device Model:    ST3000DM001-9YN166

Serial Number:    Z1F0E70H

Firmware Version: CC4C

User Capacity:    3,000,592,982,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Tue May  7 00:38:13 2013 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

was never started.

Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 584) seconds.

Offline data collection

capabilities: (0x73) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

No Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  1) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (  2) minutes.

SCT capabilities:       (0x3085) SCT Status supported.

 

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  114  099  006    Pre-fail  Always      -      64856304

  3 Spin_Up_Time            0x0003  092  092  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      1012

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  066  060  030    Pre-fail  Always      -      17199648463

  9 Power_On_Hours          0x0032  098  098  000    Old_age  Always      -      2557

10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      419

183 Runtime_Bad_Block      0x0032  100  100  000    Old_age  Always      -      0

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0

189 High_Fly_Writes        0x003a  094  094  000    Old_age  Always      -      6

190 Airflow_Temperature_Cel 0x0022  061  049  045    Old_age  Always      -      39 (Min/Max 21/39)

191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      47

193 Load_Cycle_Count        0x0032  093  093  000    Old_age  Always      -      15448

194 Temperature_Celsius    0x0022  039  051  000    Old_age  Always      -      39 (0 16 0 0)

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      138

240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      12940736464244

241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      227063173794212

242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      174954067296341

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed without error      00%      2549        -

# 2  Extended offline    Completed without error      00%        5        -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Could it be that this is just coused by a crappy cable or the power supply not having enaugh power? Its a rather old computer with an PCI SATA expansion card (4 slots). Thank you

syslog.txt

Link to comment

I'd reseat (unplug/replug) both the power and data cables to the drive -- better yet, replace the data cable with a new locking cable.

 

Hard to say whether the power supply could be an issue, since you didn't post its specs;  but with only 4 drives its generally unlikely.

 

Reseat the cables; then see if the status changes.    In addition, post the specifics of your system (motherboard/CPU/memory/PSU specs/add-in controller card make/model)

 

Link to comment

Intel Pentium 4 with 2.35 GHz

OEM motherboard MS-6583 from MSI

2 GB DDR1 RAM

250W Power supply

S-ATA Raid Controller, 4 channel with Silicon Image 3114 chip

3x 3TB hard drive

1x 2TB hard drive

 

What is the best procedure to detect which drive is which? Do I disconnect one at a time and start the server to see which one is missing? The 3 3TB HDDs are of the same type and I have no record :-( Although this time I will create one^^

Link to comment

As Joe noted, UnRAID displays the serial number for each drive on the Web GUI ... so you can note the serial number for each; then look on the drives and determine which is which [you'll likely have to remove the drives to read the serial number].

 

Link to comment

OK, I noted everything and changed the SATA cable of the parity drive (it was different from the other 3 so I swapped it). Rebuilding parity now.

 

Any notes on the hardware or is everything ok?

power supply might be a bit on the small side for 4 drives and an older power hungry pentium.
Link to comment

Agree the PSU is a bit small -- if you get random issues when the drives are spinning up or during parity checks, this is probably the first thing I'd change [Go with a good 80+ certified 400W unit if you replace it ... you don't want too large a unit, as then you're running outside of its most efficient operating range).

 

You probably don't have any choice on this system, but PCI SATA cards are much slower than motherboard ports or PCIe x4 (or x8) cards.    Something to remember when you eventually decide to upgrade the motherboard/CPU.    The performance difference probably doesn't matter for most purposes; but parity checks will take appreciably longer than they would on a faster controller.

 

Link to comment

Thanks, I know. I started with this old system, knowing that it would some day reach its capacity. But I am realy impressed how much unraid can make out of such an old system - it runs very smoothly :-)

 

When the third drive is filled I will eventually invest in a completely new system but until that day I am happy with it (hoping it was the sata cable, not the PSU :-P )

Link to comment

Sounds like it was just the cable.  For only 4 drives, you're PSU is probably fine -- UnRAID doesn't tax the CPU much, so it won't hit it's rated power rating anyway.  I suspect if you measured your power consumption with a Kill-a-Watt you'd find it's not much over 100W.    If the PSU was going to be a problem, you'd notice it during boot or during spin-ups when doing a parity check.

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...