LVLAaron Posted January 26, 2017 Share Posted January 26, 2017 I'm in a bit of a bind. Installed 2x new 4TB drives in my system and am preclearing them. Which will take ages. Overnight I ran into an issue where a 1TB drive in the system has died :o :o :'( Can I kill the preclear process and proceed as normal after step "5. verifies the signature" when it starts the post read? Put another way, can I kill the preclear process DURING the post-read and add the drive to my array? Quote Link to comment
trurl Posted January 26, 2017 Share Posted January 26, 2017 I'm in a bit of a bind. Installed 2x new 4TB drives in my system and am preclearing them. Which will take ages. Overnight I ran into an issue where a 1TB drive in the system has died :o :o :'( Can I kill the preclear process and proceed as normal after step "5. verifies the signature" when it starts the post read? Put another way, can I kill the preclear process DURING the post-read and add the drive to my array? Saying a drive "died" is a little vague and others say similar things without specific evidence only to discover there is a different problem. How did you determine the disk "died". If you are going to use a disk for a rebuild, it is not required for it to be clear since it is going to be completely overwritten with the data calculated from parity. unRAID only needs a clear disk when adding it to a new slot so parity will remain valid. If the SMART for the new drive looks OK then you might as well go ahead and use it for the rebuild. The rebuild will wind up testing it more anyway, and you can check its SMART again afterwards and also do a non-correcting parity check to see if the rebuild went OK. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 My parity drive is 1.5TB today. One of those new 4TB drives was going to be used to replace that. Parity today is 1.5TB If I replace the "failed" 1TB drive with a 4TB drive... I know the parity drive needs to be the biggest drive. * Can I replace the 1TB drive with a new 4TB drive? - will unraid throw an errors about adding that in there? How will it handle the extra 3TB? Will that become available once I upgrade the parity drive? - I"ll start a new thread for this in a relevant area. * My question remains, can I kill the pre-clear during the post-read? it's almost done and I dont want to wait another 12 hours to replace this drive. smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build) Copyright © 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION == Vendor: /10:0:0: Product: 0 Compliance: SPC-5 User Capacity: 600,332,565,813,390,450 bytes [600 PB] Logical block size: 774843950 bytes Physical block size: 3166222336 bytes Lowest aligned LBA: 12346 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 >> Terminate command early due to bad response to IEC mode page A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Quote Link to comment
trurl Posted January 26, 2017 Share Posted January 26, 2017 My parity drive is 1.5TB today. One of those new 4TB drives was going to be used to replace that. Parity today is 1.5TB If I replace the "failed" 1TB drive with a 4TB drive... I know the parity drive needs to be the biggest drive. * Can I replace the 1TB drive with a new 4TB drive? - will unraid throw an errors about adding that in there? How will it handle the extra 3TB? Will that become available once I upgrade the parity drive? - I"ll start a new thread for this in a relevant area. * My question remains, can I kill the pre-clear during the post-read? it's almost done and I dont want to wait another 12 hours to replace this drive. smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build) Copyright © 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION == Vendor: /10:0:0: Product: 0 Compliance: SPC-5 User Capacity: 600,332,565,813,390,450 bytes [600 PB] Logical block size: 774843950 bytes Physical block size: 3166222336 bytes Lowest aligned LBA: 12346 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 >> Terminate command early due to bad response to IEC mode page A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Your questions sound as if you don't have a good enough idea how to proceed without losing data, so don't do anything at all except follow our specific instructions. The SMART you posted more often just means the drive has lost connection. There is a way to replace the parity drive with the new drive, and replace the problem drive with the parity drive, but before we get into that more complicated procedure, I would like to check the rest of your system to make sure that procedure has a good chance of succeeding. Stop the preclear. Immediately after you do so shut down, check connections on the disk, being very careful that you don't disturb any other disk connections. Leave the new disks connected also so we can take a look at them. Then reboot and go to Tools - Diagnostics and post the complete diagnostics zip so we can have more complete information before recommending how to proceed. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 I read through the swap-disable procedure and am comfortable with that. (I have backups) I have another 1tb drive that I could use to just swap the bad one at this point. It is old and is a 2.5inch drive, It's probably close to death anyway. Does the pre-clear help me at all with the swap disable? Ie, will I still be waiting for unraid to clear the drive in that process? Quote Link to comment
trurl Posted January 26, 2017 Share Posted January 26, 2017 I read through the swap-disable procedure and am comfortable with that. (I have backups) I have another 1tb drive that I could use to just swap the bad one at this point. It is old and is a 2.5inch drive, It's probably close to death anyway. Does the pre-clear help me at all with the swap disable? Ie, will I still be waiting for unraid to clear the drive in that process? No, swap-disable doesn't require a clear disk. And if you rebuild to that old 2.5 disk of course you won't need swap-disable anyway. But since I don't have any information about that disk I have to wonder whether it is reliable enough to use. It may be that you can just rebuild the disk to itself though and not bother with swap disable, so I still would like to see your diagnostics after you try to correct the connections to the "dead" disk to see if it is truly dead and to make sure you don't have any other issues before proceeding. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build) Copyright © 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: /10:0:0: Product: 0 Compliance: SPC-5 User Capacity: 600,332,565,813,390,450 bytes [600 PB] Logical block size: 774843950 bytes Physical block size: 3166222336 bytes Lowest aligned LBA: 12346 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 >> Terminate command early due to bad response to IEC mode page A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. tower-diagnostics-20170126-1035.zip Quote Link to comment
JorgeB Posted January 26, 2017 Share Posted January 26, 2017 All devices on the Marvell controller are having timeout issues, disk7 ended up dropping offline. Those controllers have issues when VT-D is enable, are you using it? Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 All devices on the Marvell controller are having timeout issues, disk7 ended up dropping offline. Those controllers have issues when VT-D is enable, are you using it? Is that the "enable virtualization" piece in the bios? EDIT: I did just install that controller yesterday so I could add a drive and start removing old small drives from the array. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 The zeroing phase on my 2 new 4TB drives is about 80 percent complete and will be done in a few hours. I have an identical 1TB drive going through pre-clear on another box just to inspect it just in case. Quote Link to comment
trurl Posted January 26, 2017 Share Posted January 26, 2017 I did just install that controller yesterday so I could add a drive and start removing old small drives from the array. It might make more sense to replace the smaller drives with the larger ones instead of adding larger drives and removing smaller ones. It would certainly be a lot simpler. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 I was in a bind. Out of ports, had to remove a drive so I had room to install my 2 new 4TB disks. FWIW: The 2 new drives are running pre-clear on the marvell controller. Maybe it's just swamped and that little 2.5 inch drive showing as failed isn't good at recovering itself or surviving a crappy controller? Quote Link to comment
trurl Posted January 26, 2017 Share Posted January 26, 2017 I was in a bind. Out of ports, had to remove a drive so I had room to install my 2 new 4TB disks. FWIW: The 2 new drives are running pre-clear on the marvell controller. Maybe it's just swamped and that little 2.5 inch drive showing as failed isn't good at recovering itself or surviving a crappy controller? Maybe you don't really need the extra ports if you reconsider the approach. I did just install that controller yesterday so I could add a drive and start removing old small drives from the array. It might make more sense to replace the smaller drives with the larger ones instead of adding larger drives and removing smaller ones. It would certainly be a lot simpler. In case you don't know what I mean. If you replace a drive with a larger drive unRAID will rebuild its contents onto the new drive using the parity calculation. If instead you add a larger drive to a new slot, you will have to clear the new disk. Then you would have to copy or move the smaller disk's files to the larger disk, which might be prone to mistakes. Then you would have to remove the smaller disk and rebuild parity. So, as you can see, much simpler to replace than to add and remove, and replace requires fewer ports than add and remove. If your goal is to replace several smaller disks with one larger disk, there a some ways to do that also, but even then you would start by replacing one of the smaller disks with the larger one. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 Yep. I gotcha. With the replace procedure, do I have to wait for the drive to be cleared/zeroed or is the array "online" and just rebuilding from parity? If that's the case, I don't need to let the new drives finish their pre-clear and I can just power off and see what's up with this failed drive and save half of a nervous day. Quote Link to comment
trurl Posted January 26, 2017 Share Posted January 26, 2017 Yep. I gotcha. With the replace procedure, do I have to wait for the drive to be cleared/zeroed or is the array "online" and just rebuilding from parity? If that's the case, I don't need to let the new drives finish their pre-clear and I can just power off and see what's up with this failed drive and save half of a nervous day. As I said at the beginning If you are going to use a disk for a rebuild, it is not required for it to be clear since it is going to be completely overwritten with the data calculated from parity. unRAID only needs a clear disk when adding it to a new slot so parity will remain valid. The array will be online during the rebuild, but if you use it the performance of the rebuild and the performance of any read/writes you do will work against each other to some extent. And, the latest versions of unRAID will clear a disk (*edit* without taking the array offline) when you add it to a new slot so it is not really even required to preclear it then. Even when unRAID doesn't require a clear disk, people often preclear a new disk just for the purpose of testing it. There are other ways to test a disk, including putting it in another computer and running the drive manufacturer's tests. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 Gotcha. Just wanted to double check. My pre-clear on these drives was mainly for a burn in process. Going to stop the pre-clear and figure out this drive. Thanks guys! Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 Rebuilding on a matching 1TB drive. Sigh. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 One more question about drive swaps. My old drives are all ReiserFS. With the swap will the new replacements be XFS? Quote Link to comment
trurl Posted January 26, 2017 Share Posted January 26, 2017 One more question about drive swaps. My old drives are all ReiserFS. With the swap will the new replacements be XFS? Rebuilds are bit-for-bit identical, and filesystems are just part of the bits, so they always rebuild the same filesystem that was on the original disk. Quote Link to comment
trurl Posted January 26, 2017 Share Posted January 26, 2017 So if you are wanting to go to XFS at some point you will need to have an empty XFS disk to copy files to. There is a sticky in this subforum which discusses converting to XFS. The main thing you need to be aware of before considering anything is that changing a disks filesystem will format it. You might take a look at that rather long thread and start at the end rather than the beginning since I think the ideas are better consolidated by that point. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 Cool. I'm seeing that with this drive that's rebuilding. I'm fine with installing new drives after a pre-clear and migrating data myself. No issues there. Thanks again. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 Yanked the "bad" drive and installed the new one on a different controller. Rebuilt parity and am now Rebuilding parity on my new 4TB drive. Took the failed drive out and put it into a test machine to run an extended SMART report, which I know doesn't mean much unless the drive is under duress in the first place, but it passed. Thoughts? smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: ST1000LM010-9YH146 Serial Number: Z101EKV4 LU WWN Device Id: 5 000c50 0352ee328 Firmware Version: CC9F User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Thu Jan 26 16:45:49 2017 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 667) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 265) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 126 099 006 Pre-fail Always - 1184067072 3 Spin_Up_Time 0x0003 096 093 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 094 094 020 Old_age Always - 6471 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 7519609 9 Power_On_Hours 0x0032 100 098 000 Old_age Always - 163 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 68 183 Runtime_Bad_Block 0x0032 088 088 000 Old_age Always - 12 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 069 051 045 Old_age Always - 31 (Min/Max 22/32) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 25 193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 6851 194 Temperature_Celsius 0x0022 031 049 000 Old_age Always - 31 (0 10 0 0 0) 195 Hardware_ECC_Recovered 0x001a 033 031 000 Old_age Always - 1184067072 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 5468 (198 77 0) 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 3708218876 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1622016792 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 163 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
JorgeB Posted January 26, 2017 Share Posted January 26, 2017 Disk looks fine, it dropped offline because of the controller timeouts. Quote Link to comment
LVLAaron Posted January 26, 2017 Author Share Posted January 26, 2017 That's scary. Are the Marvell controllers crappy? I can get it out of there, I used it because it's sata 3 and had it laying around. Quote Link to comment
JorgeB Posted January 26, 2017 Share Posted January 26, 2017 They usually work fine with vt-d disable, with it enable is hit and miss, mostly miss. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.