Upgraded parity drive and can't sync parity


Recommended Posts

Hello. I upgraded my parity drive on my version 5.0-rc8a UnRaid server. I went from a 2TB drive to a 3TB.  It detected my 3TB WD Red Drive. When I try to do a parity sync it gets to about 7 or 8% (1929779 writes) and then I it shows errors (128). The light beside my parity drive in dashboard is red.

 

I've tried to start array and start parity again, but the same thing happens. The log shows this:

 

Dec 7 16:44:20 Tower kernel: handle_stripe write error: 477222544/0, count: 1

Dec 7 16:44:20 Tower kernel: md: disk0 write error

Dec 7 16:44:20 Tower kernel: handle_stripe write error: 477222552/0, count: 1

Dec 7 16:44:20 Tower kernel: md: disk0 write error

Dec 7 16:44:20 Tower kernel: handle_stripe write error: 477222560/0, count: 1

Dec 7 16:44:20 Tower kernel: md: disk0 write error

Dec 7 16:44:20 Tower kernel: handle_stripe write error: 477222568/0, count: 1

Dec 7 16:44:20 Tower kernel: md: recovery thread sync completion status: -4

Dec 7 16:44:20 Tower kernel: md: recovery thread woken up ...

Dec 7 16:44:20 Tower kernel: md: recovery thread has nothing to resync

 

I have 11 2TB drives and 1 3TB parity drive.

 

Please help.

 

 

Link to comment

Your parity drive has stopped responding... The "writes" to it are failing.

 

Either

 

1. The drive died an early death

or

2. A cable to it is loose and is no longer making connection

or

3. A cable to it is intermittent, and is no longer electrically connected

or

4. A drive tray/backplane is intermittent and the drive is no longer connected to the drive controller through it.

or

5. Your power supply is inadequate for all the drives connected to it, and the drive being written to is unable to write properly because of improper voltages to it.

 

With 12 drives, what specific make/model power supply are you using?

 

Joe L.

 

Link to comment

Hi Joe. I appreciate the help.

 

1. The Drive is brand new

2. I checked the sata cable to all drives and they all seem tight

3. I checked the power to all my drives, they seem fine

4. The parity drive is connected directly to the MB (via Sata)

5.  I am using a Corsair TX650W Power Supply.

 

I rebooted and it did not detect  my parity drive or drive 2. I powered down, secured all cables rebooted and all was fine. I to sync parity again and I got errors again. See picture.

 

I am thinking of taking the 3TB drive out of another computer and trying to sync parity. Is that a reasonable next step? The 3TB drive in my other computer is not as "good" as the WD Red though.

redparity.jpg.3c8db800fd030b006f470ddc6f154940.jpg

Link to comment

Did you preclear the drives ?

 

I have two WD RED's performing like a charm, thing is that any drive might be a dud.. This is why you have to excercise it HARD when you first use it, this will make sure the bad ones show themselves...

 

If you preclear then the bad stuff will show before the drive is in the array.

 

If you put a drive without preclear in the array then it will have a part in the total safety of your data. With unraid your array will be fine when one drive fails.

 

If you put in a drive without a preclear you have a higher chance of it failing, if at that point another drives fails you will loose data..

 

Link to comment

I think this is it. Was in /tmp folder

 

root@Tower:/tmp# cat smart_start_sda
Disk: /dev/sda
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T1321793
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   9
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Dec  8 14:09:31 2012 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (40320) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   227   227   021    Pre-fail  Always       -       3641
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       5
  5 Reallocated_Sector_Ct   0x0033   133   133   140    Pre-fail  Always   FAILING_NOW 1967
  7 Seek_Error_Rate         0x002e   001   001   000    Old_age   Always       -       3166
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       6
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1
194 Temperature_Celsius     0x0022   125   118   000    Old_age   Always       -       25
196 Reallocated_Event_Count 0x0032   171   171   000    Old_age   Always       -       29
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.