Failed Drive // Moving from pre-cleared thread.

LVLAaron · January 26, 2017

I'm in a bit of a bind.

Installed 2x new 4TB drives in my system and am preclearing them. Which will take ages. Overnight I ran into an issue where a 1TB drive in the system has died :o :o :'(

Can I kill the preclear process and proceed as normal after step "5. verifies the signature" when it starts the post read?

Put another way, can I kill the preclear process DURING the post-read and add the drive to my array?

trurl · January 26, 2017

I'm in a bit of a bind.

Installed 2x new 4TB drives in my system and am preclearing them. Which will take ages. Overnight I ran into an issue where a 1TB drive in the system has died :o :o :'(

Can I kill the preclear process and proceed as normal after step "5. verifies the signature" when it starts the post read?

Put another way, can I kill the preclear process DURING the post-read and add the drive to my array?

Saying a drive "died" is a little vague and others say similar things without specific evidence only to discover there is a different problem. How did you determine the disk "died".

If you are going to use a disk for a rebuild, it is not required for it to be clear since it is going to be completely overwritten with the data calculated from parity. unRAID only needs a clear disk when adding it to a new slot so parity will remain valid.

If the SMART for the new drive looks OK then you might as well go ahead and use it for the rebuild. The rebuild will wind up testing it more anyway, and you can check its SMART again afterwards and also do a non-correcting parity check to see if the rebuild went OK.

LVLAaron · January 26, 2017

My parity drive is 1.5TB today. One of those new 4TB drives was going to be used to replace that.

Parity today is 1.5TB

If I replace the "failed" 1TB drive with a 4TB drive... I know the parity drive needs to be the biggest drive.

* Can I replace the 1TB drive with a new 4TB drive? - will unraid throw an errors about adding that in there? How will it handle the extra 3TB? Will that become available once I upgrade the parity drive? - I"ll start a new thread for this in a relevant area.

* My question remains, can I kill the pre-clear during the post-read? it's almost done and I dont want to wait another 12 hours to replace this drive.

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build)
Copyright © 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ==

Vendor: /10:0:0:

Product: 0

Compliance: SPC-5

User Capacity: 600,332,565,813,390,450 bytes [600 PB]

Logical block size: 774843950 bytes

Physical block size: 3166222336 bytes

Lowest aligned LBA: 12346

scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

>> Terminate command early due to bad response to IEC mode page

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

trurl · January 26, 2017

My parity drive is 1.5TB today. One of those new 4TB drives was going to be used to replace that.

Parity today is 1.5TB

If I replace the "failed" 1TB drive with a 4TB drive... I know the parity drive needs to be the biggest drive.

* Can I replace the 1TB drive with a new 4TB drive? - will unraid throw an errors about adding that in there? How will it handle the extra 3TB? Will that become available once I upgrade the parity drive? - I"ll start a new thread for this in a relevant area.

* My question remains, can I kill the pre-clear during the post-read? it's almost done and I dont want to wait another 12 hours to replace this drive.

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build)
Copyright © 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ==

Vendor: /10:0:0:

Product: 0

Compliance: SPC-5

User Capacity: 600,332,565,813,390,450 bytes [600 PB]

Logical block size: 774843950 bytes

Physical block size: 3166222336 bytes

Lowest aligned LBA: 12346

scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

>> Terminate command early due to bad response to IEC mode page

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Your questions sound as if you don't have a good enough idea how to proceed without losing data, so don't do anything at all except follow our specific instructions.

The SMART you posted more often just means the drive has lost connection. There is a way to replace the parity drive with the new drive, and replace the problem drive with the parity drive, but before we get into that more complicated procedure, I would like to check the rest of your system to make sure that procedure has a good chance of succeeding.

Stop the preclear. Immediately after you do so shut down, check connections on the disk, being very careful that you don't disturb any other disk connections. Leave the new disks connected also so we can take a look at them.

Then reboot and go to Tools - Diagnostics and post the complete diagnostics zip so we can have more complete information before recommending how to proceed.

LVLAaron · January 26, 2017

I read through the swap-disable procedure and am comfortable with that. (I have backups)

I have another 1tb drive that I could use to just swap the bad one at this point. It is old and is a 2.5inch drive, It's probably close to death anyway.

Does the pre-clear help me at all with the swap disable? Ie, will I still be waiting for unraid to clear the drive in that process?

trurl · January 26, 2017

I read through the swap-disable procedure and am comfortable with that. (I have backups)

I have another 1tb drive that I could use to just swap the bad one at this point. It is old and is a 2.5inch drive, It's probably close to death anyway.

Does the pre-clear help me at all with the swap disable? Ie, will I still be waiting for unraid to clear the drive in that process?

No, swap-disable doesn't require a clear disk. And if you rebuild to that old 2.5 disk of course you won't need swap-disable anyway. But since I don't have any information about that disk I have to wonder whether it is reliable enough to use.

It may be that you can just rebuild the disk to itself though and not bother with swap disable, so I still would like to see your diagnostics after you try to correct the connections to the "dead" disk to see if it is truly dead and to make sure you don't have any other issues before proceeding.

LVLAaron · January 26, 2017

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build)
Copyright © 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Vendor: /10:0:0:

Product: 0

Compliance: SPC-5

User Capacity: 600,332,565,813,390,450 bytes [600 PB]

Logical block size: 774843950 bytes

Physical block size: 3166222336 bytes

Lowest aligned LBA: 12346

scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

>> Terminate command early due to bad response to IEC mode page

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

tower-diagnostics-20170126-1035.zip

JorgeB · January 26, 2017

All devices on the Marvell controller are having timeout issues, disk7 ended up dropping offline.

Those controllers have issues when VT-D is enable, are you using it?

LVLAaron · January 26, 2017

All devices on the Marvell controller are having timeout issues, disk7 ended up dropping offline.

Those controllers have issues when VT-D is enable, are you using it?

Is that the "enable virtualization" piece in the bios?

EDIT: I did just install that controller yesterday so I could add a drive and start removing old small drives from the array.

LVLAaron · January 26, 2017

The zeroing phase on my 2 new 4TB drives is about 80 percent complete and will be done in a few hours.

I have an identical 1TB drive going through pre-clear on another box just to inspect it just in case.

trurl · January 26, 2017

I did just install that controller yesterday so I could add a drive and start removing old small drives from the array.

It might make more sense to replace the smaller drives with the larger ones instead of adding larger drives and removing smaller ones.

It would certainly be a lot simpler.

LVLAaron · January 26, 2017

I was in a bind. Out of ports, had to remove a drive so I had room to install my 2 new 4TB disks.

FWIW: The 2 new drives are running pre-clear on the marvell controller. Maybe it's just swamped and that little 2.5 inch drive showing as failed isn't good at recovering itself or surviving a crappy controller?

trurl · January 26, 2017

I was in a bind. Out of ports, had to remove a drive so I had room to install my 2 new 4TB disks.

FWIW: The 2 new drives are running pre-clear on the marvell controller. Maybe it's just swamped and that little 2.5 inch drive showing as failed isn't good at recovering itself or surviving a crappy controller?

Maybe you don't really need the extra ports if you reconsider the approach.

I did just install that controller yesterday so I could add a drive and start removing old small drives from the array.

It might make more sense to replace the smaller drives with the larger ones instead of adding larger drives and removing smaller ones.

It would certainly be a lot simpler.

In case you don't know what I mean. If you replace a drive with a larger drive unRAID will rebuild its contents onto the new drive using the parity calculation.

If instead you add a larger drive to a new slot, you will have to clear the new disk. Then you would have to copy or move the smaller disk's files to the larger disk, which might be prone to mistakes. Then you would have to remove the smaller disk and rebuild parity.

So, as you can see, much simpler to replace than to add and remove, and replace requires fewer ports than add and remove.

If your goal is to replace several smaller disks with one larger disk, there a some ways to do that also, but even then you would start by replacing one of the smaller disks with the larger one.

LVLAaron · January 26, 2017

Yep. I gotcha. With the replace procedure, do I have to wait for the drive to be cleared/zeroed or is the array "online" and just rebuilding from parity?

If that's the case, I don't need to let the new drives finish their pre-clear and I can just power off and see what's up with this failed drive and save half of a nervous day.

trurl · January 26, 2017

Yep. I gotcha. With the replace procedure, do I have to wait for the drive to be cleared/zeroed or is the array "online" and just rebuilding from parity?

If that's the case, I don't need to let the new drives finish their pre-clear and I can just power off and see what's up with this failed drive and save half of a nervous day.

As I said at the beginning

If you are going to use a disk for a rebuild, it is not required for it to be clear since it is going to be completely overwritten with the data calculated from parity. unRAID only needs a clear disk when adding it to a new slot so parity will remain valid.

The array will be online during the rebuild, but if you use it the performance of the rebuild and the performance of any read/writes you do will work against each other to some extent.

And, the latest versions of unRAID will clear a disk (*edit* without taking the array offline) when you add it to a new slot so it is not really even required to preclear it then. Even when unRAID doesn't require a clear disk, people often preclear a new disk just for the purpose of testing it. There are other ways to test a disk, including putting it in another computer and running the drive manufacturer's tests.

LVLAaron · January 26, 2017

Gotcha. Just wanted to double check. My pre-clear on these drives was mainly for a burn in process.

Going to stop the pre-clear and figure out this drive. Thanks guys!

LVLAaron · January 26, 2017

Rebuilding on a matching 1TB drive. Sigh.

LVLAaron · January 26, 2017

One more question about drive swaps. My old drives are all ReiserFS. With the swap will the new replacements be XFS?

trurl · January 26, 2017

One more question about drive swaps. My old drives are all ReiserFS. With the swap will the new replacements be XFS?

Rebuilds are bit-for-bit identical, and filesystems are just part of the bits, so they always rebuild the same filesystem that was on the original disk.

trurl · January 26, 2017

So if you are wanting to go to XFS at some point you will need to have an empty XFS disk to copy files to. There is a sticky in this subforum which discusses converting to XFS.

The main thing you need to be aware of before considering anything is that changing a disks filesystem will format it.

You might take a look at that rather long thread and start at the end rather than the beginning since I think the ideas are better consolidated by that point.

LVLAaron · January 26, 2017

Cool. I'm seeing that with this drive that's rebuilding. I'm fine with installing new drives after a pre-clear and migrating data myself. No issues there. Thanks again.

LVLAaron · January 26, 2017

Yanked the "bad" drive and installed the new one on a different controller. Rebuilt parity and am now Rebuilding parity on my new 4TB drive.

Took the failed drive out and put it into a test machine to run an extended SMART report, which I know doesn't mean much unless the drive is under duress in the first place, but it passed. Thoughts?

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST1000LM010-9YH146
Serial Number:    Z101EKV4
LU WWN Device Id: 5 000c50 0352ee328
Firmware Version: CC9F
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jan 26 16:45:49 2017 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
				was completed without error.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		(  667) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 265) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x103b)	SCT Status supported.
				SCT Error Recovery Control supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   126   099   006    Pre-fail  Always       -       1184067072
  3 Spin_Up_Time            0x0003   096   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   094   094   020    Old_age   Always       -       6471
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   069   060   030    Pre-fail  Always       -       7519609
  9 Power_On_Hours          0x0032   100   098   000    Old_age   Always       -       163
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       68
183 Runtime_Bad_Block       0x0032   088   088   000    Old_age   Always       -       12
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   051   045    Old_age   Always       -       31 (Min/Max 22/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       25
193 Load_Cycle_Count        0x0032   097   097   000    Old_age   Always       -       6851
194 Temperature_Celsius     0x0022   031   049   000    Old_age   Always       -       31 (0 10 0 0 0)
195 Hardware_ECC_Recovered  0x001a   033   031   000    Old_age   Always       -       1184067072
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       5468 (198 77 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3708218876
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1622016792

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       163         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

JorgeB · January 26, 2017

Disk looks fine, it dropped offline because of the controller timeouts.

LVLAaron · January 26, 2017

That's scary. Are the Marvell controllers crappy? I can get it out of there, I used it because it's sata 3 and had it laying around.

JorgeB · January 26, 2017

They usually work fine with vt-d disable, with it enable is hit and miss, mostly miss.

Failed Drive // Moving from pre-cleared thread.

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation