butlerpeter Posted October 22, 2012 Share Posted October 22, 2012 Hi, I've had my unraid box running for a couple of months now. Currently it has 2 x 3TB drives installed (1 for data and 1 that will be parity but is as yet unassigned) and a 250GB cache drive. The original plan was to replace my Qnap NAS so over the last couple of months I've been moving data over from the Qnap box to the unraid machine. I'm now at the point where I've moved over all of the data that I want to move, but the data has almost filled the 3TB data drive in the unraid box (approx 100Gb free). The 2 1.5TB disks from the Qnap box are now undergoing clearing in the unraid machine, with the intention of adding them to the array once that process is complete. My question is - once the drives are added to the array, what is the best method to redistribute the data that is on the existing 3TB drive? I have the allocation method for my shares set as high water, with split level of 2 or 3 depending on what the share contains. None of the shares are currently set to use the cache drive, it's purely being used at the moment for hosting plugins (sabnzbd etc). Am I best off moving data in chunks from /mnt/disk1/[share] to the cache drive and then moving from there to /mnt/user/[share]? Would that then distribute files across all of the drives. Is there a best order in which to move files? Largest first, newest first, alphabetical? Has anybody else gone through a similar process? Does a script exist that might do the work for me? Also at what point should I activate parity? (I know I really should have done so before now, but with moving data around on and off I never got round to it). Should I activate it, and build parity, before adding the extra drives and moving data around, or should I leave it until the data is where I want it? Thanks for any help and advice Peter Quote Link to comment
sureguy Posted October 22, 2012 Share Posted October 22, 2012 If you are pre-clearing the disks that used to house your data and you have no backups, or don't relish the idea of restoring from backup, you should enable parity immediately. Quote Link to comment
Joe L. Posted October 22, 2012 Share Posted October 22, 2012 If you are pre-clearing the disks that used to house your data and you have no backups, or don't relish the idea of restoring from backup, you should enable parity immediately. Exactly, you are WAY past the point where you should have established parity. I hope you have perfect disks, because if you do not, you may have lost some data. Run "smartctl" reports on all your disks NOW, then assign the parity disk, calculate parity, then run smart reports again. Just hope that there are no additional new un-readable sectors identified during the parity calc process. If there are, odds were not in your favor. Joe L. Quote Link to comment
dave Posted October 23, 2012 Share Posted October 23, 2012 I'm interested in this exact same topic as I want to pull out a drive in favor of a smaller count, higher capacity drive array. I have ~400GB on a 1TB drive that I need to distribute back onto the other 3 drives. I could manually move the files off, but that would take a significant amount of time... Quote Link to comment
butlerpeter Posted October 23, 2012 Author Share Posted October 23, 2012 If you are pre-clearing the disks that used to house your data and you have no backups, or don't relish the idea of restoring from backup, you should enable parity immediately. Exactly, you are WAY past the point where you should have established parity. I hope you have perfect disks, because if you do not, you may have lost some data. Run "smartctl" reports on all your disks NOW, then assign the parity disk, calculate parity, then run smart reports again. Just hope that there are no additional new un-readable sectors identified during the parity calc process. If there are, odds were not in your favor. Joe L. Yeah, as I said, I know I should have enabled parity before now. As it stands now, the 2 drives from the Qnap have just started the 3rd of 3 preclear cycles. I've run a smartctl -H for all of the drives and they all came back as passed. I'll wait for the preclears to finish before I run any more in depth tests. Should I enable parity now? Or wait until tomorrow when the clears have finished and enable parity once I've added the other drives to the array? What about the question of moving files around once the new drives are in the array? Does anybody have any input on that? Peter Quote Link to comment
Joe L. Posted October 23, 2012 Share Posted October 23, 2012 smartctl -H does not inform you if you have re-allocated sectors, or sectors pending re-allocation. It only tells you if a given "normalized" attribute is lower than its affiliated failure threshold. Run smartctl -a /dev/sdX on all your drives now. (and then later after parity is calculated) Then, assign a parity drive and calculate parity. Compare the re-allocated sectors and sectors pending re-allocation. If you are very lucky, there will be no additional sectors pending re-allocation after parity is calculated. Or, don't run the reports... since there is little to do at this time regardless to fix anything unless you have backup copies of critical files elsewhere and checksums to use to determine corruption. Joe L. Quote Link to comment
Joe L. Posted October 23, 2012 Share Posted October 23, 2012 Here is a great example of a disk that passes the -H test, but is horribly bad in sectors reallocated and pending re-allocation It pases the -H test [code]root@Tower2:~# smartctl -H /dev/sdl smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED The -a option shows the true "health" of the drive. Every time I read this drive (preclear or otherwise) additional un-readable sectors show up. root@Tower2:~# smartctl -a /dev/sdl Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 019 019 005 Pre-fail Always - [color=red]1631[/color] 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - [color=red]2052[/color] 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - [color=red]11[/color][/color][/b] [code]root@Tower2:~# smartctl -a /dev/sdl smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: Hitachi HDS5C3020ALA632 Serial Number: ML0220F30XGTPD Firmware Version: ML6OA580 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Tue Oct 23 09:14:10 2012 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (22457) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 095 095 016 Pre-fail Always - 589827 2 Throughput_Performance 0x0005 135 135 054 Pre-fail Offline - 97 3 Spin_Up_Time 0x0007 140 140 024 Pre-fail Always - 415 (Average 370) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 25 5 Reallocated_Sector_Ct 0x0033 019 019 005 Pre-fail Always - 1631 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 146 146 020 Pre-fail Offline - 29 9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 8538 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 25 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 29 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 29 194 Temperature_Celsius 0x0002 214 214 000 Old_age Always - 28 (Min/Max 23/42) 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 2052 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 11 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 29 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 29 occurred at disk power-on lifetime: 8377 hours (349 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 0f f1 47 0c 06 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 80 00 80 47 0c 40 08 2d+04:57:55.208 READ FPDMA QUEUED 60 80 00 00 47 0c 40 08 2d+04:57:55.207 READ FPDMA QUEUED 60 80 00 80 46 0c 40 08 2d+04:57:55.207 READ FPDMA QUEUED 60 80 00 00 46 0c 40 08 2d+04:57:55.206 READ FPDMA QUEUED 60 80 00 80 45 0c 40 08 2d+04:57:55.205 READ FPDMA QUEUED Error 28 occurred at disk power-on lifetime: 8376 hours (349 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 30 50 60 53 0f Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 80 00 00 60 53 40 08 2d+04:37:32.123 READ FPDMA QUEUED 60 80 00 80 5f 53 40 08 2d+04:37:32.122 READ FPDMA QUEUED 60 80 00 00 5f 53 40 08 2d+04:37:32.121 READ FPDMA QUEUED 60 80 00 80 5e 53 40 08 2d+04:37:32.120 READ FPDMA QUEUED 60 80 00 00 5e 53 40 08 2d+04:37:32.119 READ FPDMA QUEUED Error 27 occurred at disk power-on lifetime: 8376 hours (349 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 10 f0 b0 a1 0d Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 80 00 80 b0 a1 40 08 2d+04:32:53.172 READ FPDMA QUEUED 60 80 00 00 b0 a1 40 08 2d+04:32:53.171 READ FPDMA QUEUED 60 80 00 80 af a1 40 08 2d+04:32:53.171 READ FPDMA QUEUED 60 80 00 00 af a1 40 08 2d+04:32:53.170 READ FPDMA QUEUED 60 80 00 80 ae a1 40 08 2d+04:32:53.169 READ FPDMA QUEUED Error 26 occurred at disk power-on lifetime: 8376 hours (349 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 48 b8 6f 0a 0d Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 80 00 80 6f 0a 40 08 2d+04:31:23.146 READ FPDMA QUEUED 60 80 00 00 6f 0a 40 08 2d+04:31:23.145 READ FPDMA QUEUED 60 80 00 80 6e 0a 40 08 2d+04:31:23.144 READ FPDMA QUEUED 60 80 00 00 6e 0a 40 08 2d+04:31:23.143 READ FPDMA QUEUED 60 80 00 80 6d 0a 40 08 2d+04:31:23.142 READ FPDMA QUEUED Error 25 occurred at disk power-on lifetime: 8376 hours (349 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 70 90 11 fd 09 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 80 00 80 11 fd 40 08 2d+04:24:18.295 READ FPDMA QUEUED 60 80 00 00 11 fd 40 08 2d+04:24:18.294 READ FPDMA QUEUED 60 80 00 80 10 fd 40 08 2d+04:24:18.283 READ FPDMA QUEUED 60 80 00 00 10 fd 40 08 2d+04:24:18.282 READ FPDMA QUEUED 60 80 00 80 0f fd 40 08 2d+04:24:18.261 READ FPDMA QUEUED SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
butlerpeter Posted October 23, 2012 Author Share Posted October 23, 2012 Joe, I ran the command for all of the drives in my system - the not yet allocated parity, the current data, the 2 new drives and my cache drive. All of them reported 0 in the RAW_VALUE column for all of the attributes that you highlighted. Should I enable parity now and let that build whilst the clear finishes on the other 2 drives? What is the procedure for enabling parity? Stop the array, assign the parity disk, start the array? Will that automatically kick off a parity build or is there some other action that I have to take? Peter Quote Link to comment
Joe L. Posted October 23, 2012 Share Posted October 23, 2012 Joe, I ran the command for all of the drives in my system - the not yet allocated parity, the current data, the 2 new drives and my cache drive. All of them reported 0 in the RAW_VALUE column for all of the attributes that you highlighted. Excellent. You should go out and buy a lottery ticket. (while your luck holds out) Let's hope they are zero again once parity is calculated. Should I enable parity now and let that build whilst the clear finishes on the other 2 drives? Yes. What is the procedure for enabling parity? Stop the array, assign the parity disk, start the array? Will that automatically kick off a parity build or is there some other action that I have to take? That is all it takes. Let it run to completion. Depending on the size of your disks and your hardware it could take a while. See here for examples: http://lime-technology.com/wiki/index.php/User_Benchmarks Joe L. Quote Link to comment
butlerpeter Posted October 24, 2012 Author Share Posted October 24, 2012 Thanks for the advice Joe. The clear has now finished on both of the drives, the report at the end for sectors etc was all 0's for both of them. What I'm planning is, when I get home from work, stop the array, add the new drives and assign parity, then start the array up and let it do it's thing building parity. Once that's done then I'll come back to the issue of moving files around. Is it safe for stuff to be written to the array whilst parity is being built? Or do I need to stop downloads and limit access in that time? Does that sound like a good plan? I would kick it all off now, but I'm connecting to the unwaid web interface through an ssh tunnel from work (using the openssh plugin) and I'm not sure if stopping the array kills the plugin and thus my ssh access. Peter Quote Link to comment
butlerpeter Posted October 26, 2012 Author Share Posted October 26, 2012 Ok, the drives are added to the array, the parity build is done, all mentioned smartctl stats are ticketyboo. Thanks for the help and advice. Back to my original question - what is the best way of redistributing files across the disks? Quote Link to comment
int13h Posted October 26, 2012 Share Posted October 26, 2012 I'd say by setting the allocation method to "Most-free" on the considered User Share. Then you copy data from the disk shares to your cache. The mover will finally redistribute data among all your disks. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.