Keeping a Hot Spare cool [and maybe Automated?]


Recommended Posts

Currently I have 12 Seagate 1tb drives in my array and I have two spare drives.  One is currrently in an external eSata enclosure as a Hot Spare.  The other sits on top of the uRaid enclosure waiting.  I have room inside the box for two more drives, but the location is not optimal as far as air flow, though drives in there would certainly run under 50°C.  What I would LIKE to do is mount the two spare drives in there, but some how make sure they were ALWAYS spun-down until called into action.  I would consider them "temporary" spares - new drives would replace the failed ones and the files moved from the "temporary" spares to the new drives and the "temporary" spares erased and spun down.

 

I figured a cron job that would periodically spin down the drive would do.... until it was put into action :o.... so, the drive as a spare should not be formatted and wouldn't be mounted, but once it was being put into the array, it would be mounted... test for that...  I'm rambling... anyone listening  ???;D

 

Also, I think this COULD be automated in the background... the switching to the hot spare, I mean...

 

Any comments?  Am I dreaming?

 

Link to comment

Also, I think this COULD be automated in the background... the switching to the hot spare, I mean...

 

Any comments?  Am I dreaming?

 

I'm nto sure I understand your question 100%

 

Are you trying to automate the moving of data to disks outside the array... or are you trying to automate the replacing/rebuilding of a failed disk...

 

Provided there is some way to know(some kind of hook or status check that can be done) that a disk has been marked as not usuable and in need of replacement, I'm sure it COULD be done.I don't know enough about that part of the operating system to be able to say for sure, maybe someone else knows.

 

My questions are: Do you want to automate the process?  What happends if the disk gets red-balled by accident, and there is really nothing wrong with the drive... do you want your server to take the long process of transfering all that data for no reason? (for the transfering data outside the protected array to temporary disks that are always there) Do you want your server to automatically taking the long process of building a red-balled disks on the unformated, non-pre-cleared drive for no reason?(for automatic failed drive replacing)

 

Those are some of the first thoughts that came to my mind.. Maybe if you went into further detail, we(as a community) MIGHT be able to figure something out for you.

 

Cheers,

Matt

Link to comment

You can set the spin down yourself (let the drive do its own cron'ning), with a command in your go script similar to this:

hdparm  -S2  /dev/sda

 

Change sda to whatever is suitable.  I recommend -S2 for cases like this, so that it will spin it down almost immediately on boot, and if it ever does spin up, back down it will go.  Make sure that -S uses a capital S.

 

I discussed this originally here:  http://lime-technology.com/forum/index.php?topic=1006

 

The only problem is that it depends on your knowledge of the Device ID given it by the kernel, and that can change.  I would check the Device ID's any time you make an important change to the hardware, or update to a newer version of the kernel or unRAID.

Link to comment

I have a general comment about hot spares that you may be interested in.

 

Background: I'm running two servers at home: an unRAID, for movies, and an OpenSolaris/ZFS for music, pictures, and "back-up" of work and other documents. (There's two servers, rather than just one, partly out of curiosity, partly to have dual parity/raidz2, and partly for certain ZFS-specific things like snapshots, checksums/scrubbing, and more ...)

 

I had the unRAID server first, and when I set up the OpenSolaris/ZFS server I went for all the bells and whistles: hot spare, cache, compression, automated snapshots, and so on. I have now dialled the feature usage back a little bit. In particular, I have found that the few hiccups the OpenSolaris/ZFS server has had would best have been dealt with by hand and that having a hot spare automatically put into use is not my preferred course of actions in most cases. My point is that I suspect that hot-spares is a great idea for intensively used data centers with strict requirements for up-time, and perhaps less so for sporadically used home servers where the occasional downtime is part of the hobby.

Link to comment

My point is that I suspect that hot-spares is a great idea for intensively used data centers with strict requirements for up-time, and perhaps less so for sporadically used home servers where the occasional downtime is part of the hobby.

 

What I would love to see is a "warm spare" feature that allows you to rebuild the failed drive onto a spare drive without stopping the array.

I.E. Something where you click a button to rebuild the drive (considering they are all spun up and being used already).

 

I don't see this happening before other features. Still I see it as an interesting idea.

 

Link to comment

My point is that I suspect that hot-spares is a great idea for intensively used data centers with strict requirements for up-time, and perhaps less so for sporadically used home servers where the occasional downtime is part of the hobby.

 

What I would love to see is a "warm spare" feature that allows you to rebuild the failed drive onto a spare drive without stopping the array.

I.E. Something where you click a button to rebuild the drive (considering they are all spun up and being used already).

 

I don't see this happening before other features. Still I see it as an interesting idea.

 

I've done exactly that, but manually when I was trying to resolve the problem I had with an intermittent "Y" splitter on my array.  I copied all the files from the "failed" drive to a "spare"

 

Here's what I did.

 

1.  I used preclear_disk.sh on a spare disk.  I'll call it /dev/hdz for the following examples:

preclear_disk.sh /dev/hdz

 

Part of what preclear_disks.sh does is to create a partition sized exactly as unRAID would if you were to add it to your array.  This "preclear" can be done at any time after you physically install the drive in your server.

 

2. Create a reiser file-system on first partition on the the precleared disk  (note: the argument given to mkreiserfs is /dev/hdz1, not /dev/hdz).

mkreiserfs -q /dev/hdz1

 

This step can also be done at any time prior to a disk failure.  At this time, the disk has a valid reiserfs file-system.

 

The next two steps can be done when you need to copy files to it from a "virtual" disk that has failed.

3. Create a mount point for the new disk.. (basically, create an empty directory)

mkdir /tmp/spare_disk

 

 

Mount the disk on the mount-point

mount -t reiserfs /dev/hdz1 /tmp/spare_disk

 

Lastly, copy the files from the failed disk to the "spare" disk.  There are two different approaches to this.  One would make a bit-for-bit copy of the entire drive, empty space included, the other would only copy the files. (It probably would be a bit quicker, especially if the disk is not full.)

 

Bit-for-bit,  should be compatible with existing parity calculations if same sized drive.  The mdX would be md1 through md15, depending on which "virtual" disk is being copied.

the of=/dev/hdz1 would be the same partition name as used when making the file-system. 

dd if=/dev/mdX bs=2048k of=/dev/hdz1

 

The second method would have the advantage of "defgagging" any files as they are copied.  It would be a simple copy command from the old disk to the new

cp -av /mnt/disk/diskX /tmp/spare_disk

 

Once either copy is complete, you can un-mount the temp drive and do as you require to add it to the array.

cd /

umount /dev/mdX

Then, Stop the array, un-assign the failed drive, assign the tem in its place using the unMENU maintenance web-page...

 

Then,

For the first "dd" method you can probably use the "Trust My Parity" procedure as described in the wiki.  Let the parity check complete, as it might find differences if the two drives were not exactly the same size.

 

Otherwise, you probably need to use the "Restore" button to initialize a new configuration and have unRAID do a full parity initial calc on the full array.

 

If you attempt this, be certain of the device names and syntax when typing the commands.  You would not want to copy from the wrong disk, nor would you want to write to the wrong disk.

 

Joe L.

Link to comment

Bit-for-bit,  should be compatible with existing parity calculations if same sized drive.  The mdX would be md1 through md15, depending on which "virtual" disk is being copied.

the of=/dev/hdz1 would be the same partition name as used when making the file-system. 

dd if=/dev/mdX bs=2048k of=/dev/hdz1

...

Then,

For the first "dd" method you can probably use the "Trust My Parity" procedure as described in the wiki.  Let the parity check complete, as it might find differences if the two drives were not exactly the same size.

 

This is not a safe way to do this. Any write to /dev/mdX will make the superblock be out of date with the copy.

I've read even a umount readonly does a write to the superblock.

 

Better to do an rsync or cp -a to the new disk and recalculate parity.

At least then you have the benefit of defragging.

 

FWIW, I've done this in the past with EXT2 and EXT3.. I've gotten away with it, but I've also lost files from it during the mount because it had to be fsck'ed.

 

 

Usually what I do is just stop the array, reassign the "failed" device to another slot that I know has a good drive.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.