Hot Spare functionality?


Recommended Posts

Dear Unraiders and Lime Tech,

 

As a storage professional am I working with storage on a daily basis. Therefore was I searching for hot spare funtionality at Unraid and did not find it.

(Feel free to correct me when I am wrong.)

 

 

Definitions:

Hot Spare: In case of a disk failure a hot spare disk is automatically added to the array and triggers the normal data/parity rebuild.

Global Hot Spare: Hot spare disk to be used by the complete array.

Local Hot Spare: Hot Spare intended for one single disk pool.

 

Question is does Unraid require hot spare functionality??

Just remember why we use NAS storage? We like to have some level of hardware redundancy for our data.

Hot spares can be an addition to the overall package of hardware redundancy and when used raise the redundancy level. 

There are many different (hardware) redundancy solutions one can add to your Unraid system. It is just a matter of how far you want to take it and how important your data is to you.

Slider always moves between cost on one side and highest possible data redundancy on the other side.

Benefit which hot spare feature would bring is the fact that the feature is fully automatic from the time a hot spare has been made available.

Another benefit is that the time your array runs in degraded mode is reduced.

Some posts expressed worry that in case unraid encounters a bad SATA connection a hot spare will kick in. (when available)

Exactly what I would want! First priority is the health of the array.

Other posts have mentioned the problem of having a hot spare available at disk array's with different disk capacities.

Well that is true. It is a bit more work when you use disks with different capacity to come to the correct disk size you need. 

In this case it is more easy to use disks with same capacity for every drive pool.

 

If hot spare feature is optional one could choose to use this feature or not. 

A choice to have a global or local hot spare would complete the whole.
 

Would it not be a very peaceful thought when I am at work and I do not have to worry about my unraid system at home in case my Unraid system could have a hot spare available....

Proactivity is the definition of being in control.

 

Cheers,

 

Marcel

 

 

 

 

 

  • Like 1
  • Upvote 1
Link to comment
43 minutes ago, SME Storage said:

Exactly what I would want! First priority is the health of the array.

 

A traditional RAID has a much larger need for hot spare support because a traditional RAID that loses one disk more than the number of parity disks will suffer a 100% data loss.

 

unRAID doesn't stripe the data - every single data disk has a separate file system. So a unRAID system with one parity drive that loses two disks will lose 1 or 2 data disks (depending on if one of the failed disks was the parity) - the other data disks will continue to supply valid file content.

 

An unRAID system with dual-parity will lose the data from 1, 2 or 3 data disks in case 3 disks fails at the same time. The other disks will continue to supply valid file content.

 

Since a traditional RAID requires you to read back every single file from backup sources if you lose one disk too much, it's quite obvious why the recommendation is to have hot spare support. Parity is about availability (not replacement for backup) and a full restore of all files from the backup is very far from the availability goals.

 

51 minutes ago, SME Storage said:

A choice to have a global or local hot spare would complete the whole.

 

Note that unRAID supports a single parity-protected array. Besides the array, you can use BTRFS mirroring - commonly used to get redundancy for a cache pool - especially if the cache pool is used to store VM.

Link to comment

Thank you very much for your post.

 

At work I manage allmost one petabyte of data. Used raid solution is based on traditional raid (raid-4-DP). The file system is striped over multiple raid arrays.

This allows use of large file systems. Each raid array can deal with losing 2 data drives simultaniously before a raid array would go down. Hot spares are a must here.

 

 

Thank you for your explaination about Unraid and how it works. I could not have put that in better writing.

I am a little worried that I have not been able to find the right words to get the message over. 

Personally I have no doubt in Unraids raid and file system concept. I feel Unraid is doing a splendit job!

Double parity based array together with mirrored cache is a good enough solution in most cases.

In general has data protection different protection levels. Parity is one, double parity is another one, Mirroring is one and backup is also an important one.

 

Hypothetical speaking in regard to the risk for multi disk failure:

It might seem over the top at first but when automatic assignment of a hot spare is possible in case a disk has died on you one would save time and by that reduce the chance for a multi disk failure even further. 

Not even considering a scenario where a replacement disk is not available and still needs to be purchased before this disk can be added to the array.

 

Hope that I was able to defend my hot spare opinion this time.

 

Regards,

 

Marcel

 

  • Upvote 1
Link to comment

My recommendation is that you consider a backup for all files you really care for, stored on a separate machine. This makes sure the files survives if the PSU of the main server breaks and fries everything. And preferably the backup server stores the backup offline, so a virus or hacker can't erase everything.

 

If you do have backup and lose more disks than the parity can handle, then you can restore just the specific disk(s) that lost the data while still having access to the data of the other data disks. The main trick is to have some form of crontab job that keeps track of which files were stored on which disk, so you don't restores duplicates. This isn't an issue if the backup is made from disk shares but if you backup user shares the backup software will not see which disk that contained the different files.

 

Most of the people we read about on this forum or other RAID forum who fails to recover despite dual-parity have almost always failed the very first rule with a RAID system. They have not been running any supervision where all disk surfaces are regularly verified and where any problems is notified in a way that the owner/administrator will see at least within 24 hours. So they think their system is well even when it is running with one or more disks broken. First several months later when one disk too many fails they log in and notices the catastrophic failure. Or if they notices that one or more disks are emulated and start a rebuild, they find that one or more of the remaining disks have unrecoverable read errors never noticed because they haven't regularly scanned them.

 

So in the end - if you lose three or more disks, the most probable cause is failed supervision. Just failing to notice the issue with the first disk. And then the second. And then the third.

 

And the much smaller but still probable cause is some form of catastrophic event (temperatures, impact, supply voltages, ...) that is likely to have hurt every disk in the machine.

 

There are quite a number of unRAID users who thinks it takes too much time to set up mail notifications. Or thinks that one mail/night from their unRAID system is just irrelevant spam. Quite a number of these users will show up in the support forum when it's too late to protect/recover all of their data. These are also often the people who think parity replaces the need for backup.

Link to comment

You need to decide which factor is your primary concern, data durability (data loss), or data availability. As mentioned backups dramatically improve data durability. But if you are after data availability, you'll need to handle all the hardware factors power supplies (as mentioned), memory (ECC and DIMM fail/sparing), cooling, and probably networking (lacp, etc).



SME Storage


Some posts expressed worry that in case unraid encounters a bad SATA connection a hot spare will kick in. (when available)
Exactly what I would want! First priority is the health of the array.



The sparing process can be scripted. As a subject matter expert, and your vast experience, this will be straight forward. Perl and python are available in the Nerd Tools. This may allow you to worry less while working.

However, I am not sure it would be "hot" as the array must shutdown to reassign the drive. You could implement NetApp's maintenance garage function, to test, and then resume or fail the drive. Edited by c3
Link to comment
  • 1 year later...

Of course it is possible to add additional functionality yourself by scripting those. On the other hand when you add custom functionality yourself then at some point this is going to work against you == additional maintenance and testing at every new Unraid version. 

You might like custom modding and that is fine but for now like to address high level design. 

By example the KISS principle by implementing out of the box solutions.   
Hot spare functionality might be a game changer in regard to Unraid's design. Have no doubt. 

An additional protection layer in securing data availability.  

Like to think that it is only a matter of time for LimeTech to add hot spare functionality. The logical next step in Unraid's evolution.  

Look around and see what happens in today's home with IOT. We see more and more automation and simplicity been added to our lives.

Automation is the way forward. The difference between pro-activity and reactivity.

 

  • Like 1
Link to comment

Actually, I was surprised Unraid did not have this time tested feature.  I personally think a hot spare capability would still be of benefit in unraid.  Even though you only lose one disk of data in unraid, it is actually also about the risk factor of losing another disk.  Once one disk is gone, you can rebuild that, but a second no.  Liklihood of more than one disk dying simultaneously?  Well, more likely the more disks you have.  And yes it does happen.  One advantage of a hot spare particularly for smaller builds is that you could do away with the negative performance impact of dual parity and still have cover to reduce the risk of a second disk dying, which is heightened once one disk has died, e.g. due to the extra heat from having to constantly calculate the parity of the failed disk. 

 

It's a really great feature and unraid is the first redundant system I've ever seen that doesn't have it.

  • Like 1
Link to comment
  • 3 years later...
  • 2 months later...
On 6/13/2018 at 5:26 PM, SME Storage said:

Thank you for your explaination about Unraid and how it works. I could not have put that in better writing.

I am a little worried that I have not been able to find the right words to get the message over. 

I agree on both points,  @pwm's explaination about Unraid and how it works was very nice.

 

However, my tl;dr on this is: If I'm accountable to manage [1+ TB of data] OR [0+ b of critical data] I would deploy a hot spare as routine part of my business continuity plan.

 

Do any docker/apps/user scripts provide Hot Spare functionality at array or cache pool  ?

 

Link to comment
  • 2 weeks later...
On 6/10/2018 at 6:26 AM, SME Storage said:

Dear Unraiders and Lime Tech,

 

As a storage professional am I working with storage on a daily basis. Therefore was I searching for hot spare funtionality at Unraid and did not find it.

(Feel free to correct me when I am wrong.)

 

 

Definitions:

Hot Spare: In case of a disk failure a hot spare disk is automatically added to the array and triggers the normal data/parity rebuild.

Global Hot Spare: Hot spare disk to be used by the complete array.

Local Hot Spare: Hot Spare intended for one single disk pool.

 

Question is does Unraid require hot spare functionality??

Just remember why we use NAS storage? We like to have some level of hardware redundancy for our data.

Hot spares can be an addition to the overall package of hardware redundancy and when used raise the redundancy level. 

There are many different (hardware) redundancy solutions one can add to your Unraid system. It is just a matter of how far you want to take it and how important your data is to you.

Slider always moves between cost on one side and highest possible data redundancy on the other side.

Benefit which hot spare feature would bring is the fact that the feature is fully automatic from the time a hot spare has been made available.

Another benefit is that the time your array runs in degraded mode is reduced.

Some posts expressed worry that in case unraid encounters a bad SATA connection a hot spare will kick in. (when available)

Exactly what I would want! First priority is the health of the array.

Other posts have mentioned the problem of having a hot spare available at disk array's with different disk capacities.

Well that is true. It is a bit more work when you use disks with different capacity to come to the correct disk size you need. 

In this case it is more easy to use disks with same capacity for every drive pool.

 

If hot spare feature is optional one could choose to use this feature or not. 

A choice to have a global or local hot spare would complete the whole.
 

Would it not be a very peaceful thought when I am at work and I do not have to worry about my unraid system at home in case my Unraid system could have a hot spare available....

Proactivity is the definition of being in control.

 

Cheers,

 

Marcel

 

 

 

 

Sonic exe

Because failing slices from RAID 1 or RAID 5 volumes are automatically replaced and resynchronized by hot spares in the event of a failure, hot spares offer protection against the failure of hardware.

Link to comment
7 hours ago, VanGogh said:

Because failing slices from RAID 1 or RAID 5 volumes are automatically replaced and resynchronized by hot spares in the event of a failure, hot spares offer protection against the failure of hardware.

Unraid doesn't use RAID1 or RAID5 for the parity array.

 

Or are you a bot/spammer account?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.