Hit my first hard drive failure

May 5, 201115 yr

Well, it finally happened... I lost a drive somehow. It passed all the preclear, but blew up when unRaid went to write to it. I am going to follow the steps here:

http://lime-technology.com/wiki/index.php?title=Replacing_a_Data_Drive

But I wanted to check first whether I should precelar the replacement disk first? I am nervous about running too long without a functioing array

Quote

May 5, 201115 yr

The drive passed multiple cycles of preclear and then failed as soon as you started using it? That is definitely a first as far as I know.

It is always a good idea to preclear the replacement disk, but it isn't strictly required. If you are worried about your array and can stand to live without it for a time, then you could preclear the replacement disk on another computer (booted from an unRAID flash drive) while your server remains off.

Quote

May 5, 201115 yr

Author

I'll be honest, I only preclear one cycle... guess I need to start doing more Good idea on a different PC, I need to do firmware upgrade anyway.

Quote

May 5, 201115 yr

A drive can fail at any time. I did some research a while back and found heated debates between people saying doing a preclear type activity was useful versus unnecessary. But most agreed that doing it over and over caused unnecessary wear and tear due to the stressful activity. I personally do single preclear cycles as a result. I found one bad drive in the preclear process.

Quote

May 6, 201115 yr

A drive can fail at any time. I did some research a while back and found heated debates between people saying doing a preclear type activity was useful versus unnecessary. But most agreed that doing it over and over caused unnecessary wear and tear due to the stressful activity. I personally do single preclear cycles as a result. I found one bad drive in the preclear process.

As I understand it, preclear is a sequential read, followed by a sequential write, followed by another sequential read. That is far from 'stressful'. Random reads and writes are stressful, but preclear seems to be pretty benign. Maybe you know something about preclear that I don't?

Quote

May 6, 201115 yr

I find it very stressful,

I keep checking back every five min to see what is doing, even though i know i still have 10+ hours to go. I why do i keep reading the email updates when I already know the status from looking at the console!

It is going to drive me to drink!

Oh, stressful to the drives.

Quote

May 6, 201115 yr

Ha!

My solution is to drink myself into a stupor so that I pass out for the entire duration of the preclear.

(kidding)

Quote

May 7, 201115 yr

Did You leave the disk in after preclearing it or have You had Your server open to move it to another slot or anything else that could have affected Your cabling? Are You certain the disk has failed and You are not looking at a loose power cable?

I'm not as knowledgeable as these other guys here, but if You are in doubt, maybe You could move the new failed drive to the computer You are going to be using to preclear an extra new drive, and try the preclear on that failed drive - if it works in the other computer You will know You have a problem with loose cables (and have saved you the grief of having the very same thing happen all over again).

Quote

May 7, 201115 yr

As I understand it, preclear is a sequential read, followed by a sequential write, followed by another sequential read. That is far from 'stressful'. Random reads and writes are stressful, but preclear seems to be pretty benign. Maybe you know something about preclear that I don't?

Close... but not quite.

pre-clear-read and post-clear-read are not sequential.

They start at sector zero and proceed as follows. the "default" read block size is set by the "units" reported by fdisk -l. Form most large disks, it is read in 8 Meg "units".

For each set of 200 "units" on the disk

Read a random block from somewhere on the disk. (since it is random, the block is unlikely to be in the buffer cache, thus this forces a seek to it)

Read the first sector on the disk, bypassing the buffer cache. This forces a seek to the first track on the disk.

Read a second random block from somewhere on the disk, again forcing another seek.

Read the last block on the disk, bypassing the buffer cache. This forces a seek to the last track on the disk.

Read a third random block on the disk. (again, since it is random, the block is unlikely to be in the buffer cache, thus this forces a seek to it)

Read the 200 "units" on the disk. ( the unit start point is advanced linearly until the end of the disk is reached )

The whole process is intended to keep the disk heads constantly seeking all over the disk from the starting sector, to the last sector, and everywhere in between.

It is highly stressful compared to a simple linear read of sectors. For a 2TB disk, and 8Meg "unit" size, and the default number of blocks to read at 200, the above loop will cause the disk to seek to the first and last sectors 1,250,000 times, seek to 3,750,000 random sectors, and also seek to each of the sets of "blocks" read in turn from the start to the end of the disk. (linearly, but not quite sequential)

Quote

May 7, 201115 yr

Wow, thanks for sharing this info JoeL. I am confused a bit in terms of preclear cycles, what's a good starting point. Originally I thought 1, then was turned to 2. Now there are reports (hitachi 2tb) preclear failing after 3rd or 4th cycle...

First, what do YOU personal do on a brand new drive?

I would like to know your thought on the following, would you say that preclearing is more stressful on a disk then say a parity check against said drive. Or for that matter anything unRAID does being more stressful than preclear?

Quote

May 7, 201115 yr

A drive can fail at any time. I did some research a while back and found heated debates between people saying doing a preclear type activity was useful versus unnecessary. But most agreed that doing it over and over caused unnecessary wear and tear due to the stressful activity. I personally do single preclear cycles as a result. I found one bad drive in the preclear process.

As I understand it, preclear is a sequential read, followed by a sequential write, followed by another sequential read. That is far from 'stressful'. Random reads and writes are stressful, but preclear seems to be pretty benign. Maybe you know something about preclear that I don't?

How long would it take, in the normal course of using a drive, to accomplish the equivalent of reading every sector twice and writing every sector once? How about 8 full reads and 4 complete writes? Running many preclear cycles on a disk may be the equivalent of years (or even a lifetime's worth) of activity on a typical disk! That's more what I meant when I used thhe word "stressful". Is this like taking a 200,000 mile test drive before deciding if you should keep a car?

Quote

May 7, 201115 yr

That's a bit of news. I has assumed it was sequential.

I was going to run 2 passes on my old recycled disks and 3 on the new drive I was putting into my unraid, per other forum suggestions.

With that bit of news, I'll stop with one. Maybe I'll zero the drive with the WD tools for new disks before I preclear it.

I normally zero or low level format all new drives i purchase to make sure they don't die the first time i use it.

Quote

May 7, 201115 yr

Thanks for the explanation Joe! Is the zeroing phase a sequential write? I often use the -n option to skip the preread when I'm burning in a server on known-good disks. I figure one pass of write and one of post-read is enough to verify that a drive bay, SATA controller, etc. is working properly. Is there some other preclear modifier that would accomplish this same task without stressing the disks so much? Or should I just not use the preclear script for this purpose at all?

Brian, I see your point now, I didn't realize that the read phases of preclear were so intensive. Before I always figured that more cycles are better, but now I'm thinking that I should implement a hard cap of 2 cycles per disk. Your thoughts?

Quote

May 7, 201115 yr

Brian, I see your point now, I didn't realize that the read phases of preclear were so intensive. Before I always figured that more cycles are better, but now I'm thinking that I should implement a hard cap of 2 cycles per disk. Your thoughts?

I think 1 is enough. I don't like putting a disk in service that had not had ever sector written.

But if you wanted to go with 2, go for it. But I definitely see reduced value with each cycle, though.

This is all in the realm of opinion. There is no real data to help us make this decision.

Quote

May 7, 201115 yr

My thought on hard drives, especially new, I do not trust them until they have proven themself as stable.

Quote

May 7, 201115 yr

Author

Something seems to have gone bad I replaced the old drive with a new Seagate. It spun up and assigned my old parity drive as my new Drive 6 with the blue ball, and assigned the new drive as parity with a red ball. And in the logs I have this:

May 7 15:26:09 Tower kernel: md: import disk0: [8,144] (sdj) ST2000DL003-9VT1 5YD2AG1H size: 1953514552

May 7 15:26:09 Tower kernel: md: disk0 wrong

May 7 15:26:09 Tower kernel: md: import disk1: [8,176] (sdl) SAMSUNG HD204UI S2HGJDWZ807150 size: 1953514552

May 7 15:26:09 Tower kernel: md: import disk2: [8,160] (sdk) SAMSUNG HD204UI S2HGJDWZ807152 size: 1953514552

May 7 15:26:09 Tower kernel: md: import disk3: [8,192] (sdm) SAMSUNG HD204UI S2HGJDWZ807155 size: 1953514552

May 7 15:26:09 Tower kernel: md: import disk4: [8,80] (sdf) SAMSUNG HD204UI S2HGJDWZ807151 size: 1953514552

May 7 15:26:09 Tower kernel: md: import disk5: [8,32] (sdc) Hitachi HDS5C302 ML0220F30A30PD size: 1953514552

May 7 15:26:09 Tower kernel: md: import disk6: [8,48] (sdd) SAMSUNG HD103SJ S246JD2Z931515 size: 976762552

May 7 15:26:09 Tower kernel: md: disk6 replaced

May 7 15:26:09 Tower kernel: md: import disk7: [8,64] (sde) Hitachi HDS5C302 ML0220F309HPMD size: 1953514552

May 7 15:26:09 Tower kernel: md: import disk8: [8,112] (sdh) Hitachi HDS5C302 ML0220F30A1DZD size: 1953514552

May 7 15:26:09 Tower kernel: md: import disk9: [8,128] (sdi) WDC WD20EARS-00M WD-WMAZA1148665 size: 1953514552

Can I just swap them back? I thought it was just supposed to assign the new drive to the missing slot and rebuild the array from there.

Quote

May 7, 201115 yr

Author

Well, I was brave and swapped everything back to the way it was supposed to be and it seems to be rebuilding now Question though, shouldn't it format the drive before rebuilding the array? Still says unformatted.

Quote

May 7, 201115 yr

Thanks for the explanation Joe! Is the zeroing phase a sequential write?

Yes, the write is sequential.

I often use the -n option to skip the preread when I'm burning in a server on known-good disks.

I would not skip the initial read because it is there that the set of un-readabl sectors is identified that could thn be re-allocated in the "write" phase. Without it, the first time you would read a sector is in the post-read. There is no option then for the process to re-allocate the sector. i suppose you can see if there are any sectors pending re-allocation at the end of the first cycle, and if there are, do another cycle.

I figure one pass of write and one of post-read is enough to verify that a drive bay, SATA controller, etc. is working properly. Is there some other preclear modifier that would accomplish this same task without stressing the disks so much? Or should I just not use the preclear script for this purpose at all?

preclear performs two tasks.

1. zero the disk while itis unassigned so it does not take the unRAID array off-line whie it clears the drive.

2. Get you past the early part of the "bathtub curve" of failures.

Brian, I see your point now, I didn't realize that the read phases of preclear were so intensive. Before I always figured that more cycles are better, but now I'm thinking that I should implement a hard cap of 2 cycles per disk. Your thoughts?

I'd rather know the bearings are working. I do not believe the drive is used any worse than in a heavily used database as far as how frequently the disk heads seek. They are made to handle it.

When would you rather handle the RMA process, before you use the disk for your data, or after it fails in a few weeks? Or during unRAID's clearing of the drive if you do not pre-clear at all.?

I typically do each drive about three times before adding it to the array.

Joe L.

Quote

May 8, 201115 yr

Author

So I'm still confused... the array rebuild finished and the drive still says unformatted. Do I just format it it? What happened to the data that was on it? I thought when I rebuilt the array it was supposed to rebuild that drive with whatever it had before...

Quote

May 8, 201115 yr

So I'm still confused... the array rebuild finished and the drive still says unformatted. Do I just format it it? What happened to the data that was on it? I thought when I rebuilt the array it was supposed to rebuild that drive with whatever it had before...

It did..

You have exactly the same access to your data with a failed disk as before. You had an "unformatted" disk before, and it re-constructed exactly the same data on the disk after rebuilding it. (and it is still "unformatted")

To unRAID, "unformatted" simply means the disk cannot be mounted as a reiserfs file-system. You might try using the reiserfsck command to see if there is anything on the disk remotely resembling a file-system. It might just have some corruption, or, depending on your prior actions, it might really be empty.

Even "empty" disks have had data recovered... You will need to do a bit more work, first a syslog is in order, then a

reiserfsck --check /dev/mdX

(where X = the disk involved) as described here in the wiki: http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems

and here:

http://lime-technology.com/wiki/index.php?title=FAQ#How_can_I_undelete_files_from_an_unRAID_disk.3F

Joe L.

Quote

Hit my first hard drive failure

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)