Jump to content
SME Storage

Hot Spare functionality?

5 posts in this topic Last Reply

Recommended Posts

Dear Unraiders and Lime Tech,

 

As a storage professional am I working with storage on a daily basis. Therefore was I searching for hot spare funtionality at Unraid and did not find it.

(Feel free to correct me when I am wrong.)

 

 

Definitions:

Hot Spare: In case of a disk failure a hot spare disk is automatically added to the array and triggers the normal data/parity rebuild.

Global Hot Spare: Hot spare disk to be used by the complete array.

Local Hot Spare: Hot Spare intended for one single disk pool.

 

Question is does Unraid require hot spare functionality??

Just remember why we use NAS storage? We like to have some level of hardware redundancy for our data.

Hot spares can be an addition to the overall package of hardware redundancy and when used raise the redundancy level. 

There are many different (hardware) redundancy solutions one can add to your Unraid system. It is just a matter of how far you want to take it and how important your data is to you.

Slider always moves between cost on one side and highest possible data redundancy on the other side.

Benefit which hot spare feature would bring is the fact that the feature is fully automatic from the time a hot spare has been made available.

Another benefit is that the time your array runs in degraded mode is reduced.

Some posts expressed worry that in case unraid encounters a bad SATA connection a hot spare will kick in. (when available)

Exactly what I would want! First priority is the health of the array.

Other posts have mentioned the problem of having a hot spare available at disk array's with different disk capacities.

Well that is true. It is a bit more work when you use disks with different capacity to come to the correct disk size you need. 

In this case it is more easy to use disks with same capacity for every drive pool.

 

If hot spare feature is optional one could choose to use this feature or not. 

A choice to have a global or local hot spare would complete the whole.
 

Would it not be a very peaceful thought when I am at work and I do not have to worry about my unraid system at home in case my Unraid system could have a hot spare available....

Proactivity is the definition of being in control.

 

Cheers,

 

Marcel

 

 

 

 

 

Share this post


Link to post
43 minutes ago, SME Storage said:

Exactly what I would want! First priority is the health of the array.

 

A traditional RAID has a much larger need for hot spare support because a traditional RAID that loses one disk more than the number of parity disks will suffer a 100% data loss.

 

unRAID doesn't stripe the data - every single data disk has a separate file system. So a unRAID system with one parity drive that loses two disks will lose 1 or 2 data disks (depending on if one of the failed disks was the parity) - the other data disks will continue to supply valid file content.

 

An unRAID system with dual-parity will lose the data from 1, 2 or 3 data disks in case 3 disks fails at the same time. The other disks will continue to supply valid file content.

 

Since a traditional RAID requires you to read back every single file from backup sources if you lose one disk too much, it's quite obvious why the recommendation is to have hot spare support. Parity is about availability (not replacement for backup) and a full restore of all files from the backup is very far from the availability goals.

 

51 minutes ago, SME Storage said:

A choice to have a global or local hot spare would complete the whole.

 

Note that unRAID supports a single parity-protected array. Besides the array, you can use BTRFS mirroring - commonly used to get redundancy for a cache pool - especially if the cache pool is used to store VM.

Share this post


Link to post

Thank you very much for your post.

 

At work I manage allmost one petabyte of data. Used raid solution is based on traditional raid (raid-4-DP). The file system is striped over multiple raid arrays.

This allows use of large file systems. Each raid array can deal with losing 2 data drives simultaniously before a raid array would go down. Hot spares are a must here.

 

 

Thank you for your explaination about Unraid and how it works. I could not have put that in better writing.

I am a little worried that I have not been able to find the right words to get the message over. 

Personally I have no doubt in Unraids raid and file system concept. I feel Unraid is doing a splendit job!

Double parity based array together with mirrored cache is a good enough solution in most cases.

In general has data protection different protection levels. Parity is one, double parity is another one, Mirroring is one and backup is also an important one.

 

Hypothetical speaking in regard to the risk for multi disk failure:

It might seem over the top at first but when automatic assignment of a hot spare is possible in case a disk has died on you one would save time and by that reduce the chance for a multi disk failure even further. 

Not even considering a scenario where a replacement disk is not available and still needs to be purchased before this disk can be added to the array.

 

Hope that I was able to defend my hot spare opinion this time.

 

Regards,

 

Marcel

 

Share this post


Link to post

My recommendation is that you consider a backup for all files you really care for, stored on a separate machine. This makes sure the files survives if the PSU of the main server breaks and fries everything. And preferably the backup server stores the backup offline, so a virus or hacker can't erase everything.

 

If you do have backup and lose more disks than the parity can handle, then you can restore just the specific disk(s) that lost the data while still having access to the data of the other data disks. The main trick is to have some form of crontab job that keeps track of which files were stored on which disk, so you don't restores duplicates. This isn't an issue if the backup is made from disk shares but if you backup user shares the backup software will not see which disk that contained the different files.

 

Most of the people we read about on this forum or other RAID forum who fails to recover despite dual-parity have almost always failed the very first rule with a RAID system. They have not been running any supervision where all disk surfaces are regularly verified and where any problems is notified in a way that the owner/administrator will see at least within 24 hours. So they think their system is well even when it is running with one or more disks broken. First several months later when one disk too many fails they log in and notices the catastrophic failure. Or if they notices that one or more disks are emulated and start a rebuild, they find that one or more of the remaining disks have unrecoverable read errors never noticed because they haven't regularly scanned them.

 

So in the end - if you lose three or more disks, the most probable cause is failed supervision. Just failing to notice the issue with the first disk. And then the second. And then the third.

 

And the much smaller but still probable cause is some form of catastrophic event (temperatures, impact, supply voltages, ...) that is likely to have hurt every disk in the machine.

 

There are quite a number of unRAID users who thinks it takes too much time to set up mail notifications. Or thinks that one mail/night from their unRAID system is just irrelevant spam. Quite a number of these users will show up in the support forum when it's too late to protect/recover all of their data. These are also often the people who think parity replaces the need for backup.

Share this post


Link to post
Posted (edited)

You need to decide which factor is your primary concern, data durability (data loss), or data availability. As mentioned backups dramatically improve data durability. But if you are after data availability, you'll need to handle all the hardware factors power supplies (as mentioned), memory (ECC and DIMM fail/sparing), cooling, and probably networking (lacp, etc).



SME Storage


Some posts expressed worry that in case unraid encounters a bad SATA connection a hot spare will kick in. (when available)
Exactly what I would want! First priority is the health of the array.



The sparing process can be scripted. As a subject matter expert, and your vast experience, this will be straight forward. Perl and python are available in the Nerd Tools. This may allow you to worry less while working.

However, I am not sure it would be "hot" as the array must shutdown to reassign the drive. You could implement NetApp's maintenance garage function, to test, and then resume or fail the drive. Edited by c3

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now