Adding disks and moving files


Recommended Posts

Hi,

 

I've had my unraid box running for a couple of months now. Currently it has 2 x 3TB drives installed (1 for data and 1 that will be parity but is as yet unassigned) and a 250GB cache drive.

 

The original plan was to replace my Qnap NAS so over the last couple of months I've been moving data over from the Qnap box to the unraid machine. I'm now at the point where I've moved over all of the data that I want to move, but the data has almost filled the 3TB data drive in the unraid box (approx 100Gb free).

 

The 2 1.5TB disks from the Qnap box are now undergoing clearing in the unraid machine, with the intention of adding them to the array once that process is complete.

 

My question is - once the drives are added to the array, what is the best method to redistribute the data that is on the existing 3TB drive?

 

I have the allocation method for my shares set as high water, with split level of 2 or 3 depending on what the share contains. None of the shares are currently set to use the cache drive, it's purely being used at the moment for hosting plugins (sabnzbd etc).

 

Am I best off moving data in chunks from /mnt/disk1/[share] to the cache drive and then moving from there to /mnt/user/[share]? Would that then distribute files across all of the drives. Is there a best order in which to move files? Largest first, newest first, alphabetical?

 

Has anybody else gone through a similar process? Does a script exist that might do the work for me?

 

Also at what point should I activate parity? (I know I really should have done so before now, but with moving data around on and off I never got round to it). Should I activate it, and build parity, before adding the extra drives and moving data around, or should I leave it until the data is where I want it?

 

Thanks for any help and advice

 

Peter

Link to comment

If you are pre-clearing the disks that used to house your data and you have no backups, or don't relish the idea of restoring from backup, you should enable parity immediately.

Exactly, you are WAY past the point where you should have established parity.

 

I hope you have perfect disks, because if you do not, you may have lost some data.

 

Run "smartctl" reports on all your disks NOW, then assign the parity disk, calculate parity, then run smart reports again.  Just hope that there are no additional new un-readable sectors identified during the parity calc process.  If there are, odds were not in your favor.

 

Joe L.

Link to comment

I'm interested in this exact same topic as I want to pull out a drive in favor of a smaller count, higher capacity drive array. I have ~400GB on a 1TB drive that I need to distribute back onto the other 3 drives. I could manually move the files off, but that would take a significant amount of time...

Link to comment

If you are pre-clearing the disks that used to house your data and you have no backups, or don't relish the idea of restoring from backup, you should enable parity immediately.

Exactly, you are WAY past the point where you should have established parity.

 

I hope you have perfect disks, because if you do not, you may have lost some data.

 

Run "smartctl" reports on all your disks NOW, then assign the parity disk, calculate parity, then run smart reports again.  Just hope that there are no additional new un-readable sectors identified during the parity calc process.  If there are, odds were not in your favor.

 

Joe L.

Yeah, as I said, I know I should have enabled parity before now. As it stands now, the 2 drives from the Qnap have just started the 3rd of 3 preclear cycles. I've run a smartctl -H for all of the drives and they all came back as passed. I'll wait for the preclears to finish before I run any more in depth tests.

 

Should I enable parity now? Or wait until tomorrow when the clears have finished and enable parity once I've added the other drives to the array?

 

What about the question of moving files around once the new drives are in the array? Does anybody have any input on that?

 

Peter

Link to comment

smartctl -H does not inform you if you have re-allocated sectors, or sectors pending re-allocation.  It only tells you if a given "normalized" attribute is lower than its affiliated failure threshold.

 

Run

smartctl -a /dev/sdX

on all your drives now. (and then later after parity is calculated)  Then, assign a parity drive and calculate parity.

 

Compare the re-allocated sectors and sectors pending re-allocation.  If you are very lucky, there will be no additional sectors pending re-allocation after parity is calculated.

 

Or, don't run the reports... since there is little to do at this time regardless to fix anything unless you have backup copies of critical files elsewhere and checksums to use to determine corruption.

 

Joe L.

Link to comment

Here is a great example of a disk that passes the -H test, but is horribly bad in sectors reallocated and pending re-allocation

It pases the -H test
[code]root@Tower2:~# smartctl -H /dev/sdl
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

 

The -a option shows the true "health" of the drive.  Every time I read this drive (preclear or otherwise) additional un-readable sectors show up.

root@Tower2:~# smartctl -a /dev/sdl

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   019   019   005    Pre-fail  Always       -       [color=red]1631[/color]
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       [color=red]2052[/color]
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       [color=red]11[/color][/color][/b]

[code]root@Tower2:~# smartctl -a /dev/sdl
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS5C3020ALA632
Serial Number:    ML0220F30XGTPD
Firmware Version: ML6OA580
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Oct 23 09:14:10 2012 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (22457) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   095   095   016    Pre-fail  Always       -       589827
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       97
  3 Spin_Up_Time            0x0007   140   140   024    Pre-fail  Always       -       415 (Average 370)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       25
  5 Reallocated_Sector_Ct   0x0033   019   019   005    Pre-fail  Always       -       1631
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       8538
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       25
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       29
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       29
194 Temperature_Celsius     0x0002   214   214   000    Old_age   Always       -       28 (Min/Max 23/42)
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       2052
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       11
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 29 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 29 occurred at disk power-on lifetime: 8377 hours (349 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 0f f1 47 0c 06

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 80 00 80 47 0c 40 08   2d+04:57:55.208  READ FPDMA QUEUED
  60 80 00 00 47 0c 40 08   2d+04:57:55.207  READ FPDMA QUEUED
  60 80 00 80 46 0c 40 08   2d+04:57:55.207  READ FPDMA QUEUED
  60 80 00 00 46 0c 40 08   2d+04:57:55.206  READ FPDMA QUEUED
  60 80 00 80 45 0c 40 08   2d+04:57:55.205  READ FPDMA QUEUED

Error 28 occurred at disk power-on lifetime: 8376 hours (349 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 30 50 60 53 0f

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 80 00 00 60 53 40 08   2d+04:37:32.123  READ FPDMA QUEUED
  60 80 00 80 5f 53 40 08   2d+04:37:32.122  READ FPDMA QUEUED
  60 80 00 00 5f 53 40 08   2d+04:37:32.121  READ FPDMA QUEUED
  60 80 00 80 5e 53 40 08   2d+04:37:32.120  READ FPDMA QUEUED
  60 80 00 00 5e 53 40 08   2d+04:37:32.119  READ FPDMA QUEUED

Error 27 occurred at disk power-on lifetime: 8376 hours (349 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 10 f0 b0 a1 0d

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 80 00 80 b0 a1 40 08   2d+04:32:53.172  READ FPDMA QUEUED
  60 80 00 00 b0 a1 40 08   2d+04:32:53.171  READ FPDMA QUEUED
  60 80 00 80 af a1 40 08   2d+04:32:53.171  READ FPDMA QUEUED
  60 80 00 00 af a1 40 08   2d+04:32:53.170  READ FPDMA QUEUED
  60 80 00 80 ae a1 40 08   2d+04:32:53.169  READ FPDMA QUEUED

Error 26 occurred at disk power-on lifetime: 8376 hours (349 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 48 b8 6f 0a 0d

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 80 00 80 6f 0a 40 08   2d+04:31:23.146  READ FPDMA QUEUED
  60 80 00 00 6f 0a 40 08   2d+04:31:23.145  READ FPDMA QUEUED
  60 80 00 80 6e 0a 40 08   2d+04:31:23.144  READ FPDMA QUEUED
  60 80 00 00 6e 0a 40 08   2d+04:31:23.143  READ FPDMA QUEUED
  60 80 00 80 6d 0a 40 08   2d+04:31:23.142  READ FPDMA QUEUED

Error 25 occurred at disk power-on lifetime: 8376 hours (349 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 70 90 11 fd 09

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 80 00 80 11 fd 40 08   2d+04:24:18.295  READ FPDMA QUEUED
  60 80 00 00 11 fd 40 08   2d+04:24:18.294  READ FPDMA QUEUED
  60 80 00 80 10 fd 40 08   2d+04:24:18.283  READ FPDMA QUEUED
  60 80 00 00 10 fd 40 08   2d+04:24:18.282  READ FPDMA QUEUED
  60 80 00 80 0f fd 40 08   2d+04:24:18.261  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

Joe,

 

I ran the command for all of the drives in my system - the not yet allocated parity, the current data, the 2 new drives and my cache drive.

 

All of them reported 0 in the RAW_VALUE column for all of the attributes that you highlighted.

 

Should I enable parity now and let that build whilst the clear finishes on the other 2 drives?

 

What is the procedure for enabling parity? Stop the array, assign the parity disk, start the array? Will that automatically kick off a parity build or is there some other action that I have to take?

 

Peter

Link to comment

Joe,

 

I ran the command for all of the drives in my system - the not yet allocated parity, the current data, the 2 new drives and my cache drive.

 

All of them reported 0 in the RAW_VALUE column for all of the attributes that you highlighted.

Excellent.  You should go out and buy a lottery ticket. (while your luck holds out)  Let's hope they are zero again once parity is calculated.

 

Should I enable parity now and let that build whilst the clear finishes on the other 2 drives?

Yes.

What is the procedure for enabling parity? Stop the array, assign the parity disk, start the array? Will that automatically kick off a parity build or is there some other action that I have to take?

That is all it takes.  Let it run to completion.  Depending on the size of your disks and your hardware it could take a while.  See here for examples:

http://lime-technology.com/wiki/index.php/User_Benchmarks

 

Joe L.

Link to comment

Thanks for the advice Joe.

 

The clear has now finished on both of the drives, the report at the end for sectors etc was all 0's for both of them.

 

What I'm planning is, when I get home from work, stop the array, add the new drives and assign parity, then start the array up and let it do it's thing building parity.

 

Once that's done then I'll come back to the issue of moving files around.

 

Is it safe for stuff to be written to the array whilst parity is being built? Or do I need to stop downloads and limit access in that time?

 

Does that sound like a good plan?

 

I would kick it all off now, but I'm connecting to the unwaid web interface through an ssh tunnel from work (using the openssh plugin) and I'm not sure if stopping the array kills the plugin and thus my ssh access.

 

Peter

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.