High Availability unRaid


tr0910

Recommended Posts

Windows Server 2012 has some high availability capabilities if you use multiple servers with clustering.    But there's nothing similar for UnRAID, although there are some clustering technologies available for Linux that can provide high availability.

 

You can do things that reduce the likelihood of failure on an UnRAID server  => e.g. redundant power supplies; enterprise-grade cooling to keep the drive temps as low as possible;  using enterprise-grade hard drives; very long run-time UPS; etc.  ... but ANY system can fail, and UnRAID is no exception.  Without a real-time cluster designed for automated failover, you can't achieve the goals of "high availability" systems.

 

A pair of systems with those features; coupled with automatic scheduled replication at scheduled intervals would provide a "very good" availability system -- but it still wouldn't have the automated failover of a true high-availability system.    I suspect a cache script could be written that would automatically write all newly cached writes to both the local system and the replicated system, so the replication would be nearly-automatic, and could be at more frequent intervals than might otherwise be feasible.

 

Link to comment

I know that I'm pushing into frontiers that have not been explored heavily by unRaid users, but would like to know where the limits really are.  All other things you mentioned are easily possible and we can do them now. 

 

I would be happy enough with a simple way to turn the backup server into the live server in 15 minutes or less, and by a tech averse noob.  Presently one is called Tower1 and the other is called Tower2.  This in itself would cause problems as we map drives to Tower1, and not Tower2.  Among other things, Tower2 would have to come to life masquerading as Tower1 in order for things to continue to work....

 

(Presently I replicate the servers daily at 1am via an IPMI wakeup and rsync script, and we would live without the last few hours of data being synced)

Link to comment

I would be happy enough with a simple way to turn the backup server into the live server in 15 minutes or less, and by a tech averse noob.

 

If the "tech averse noob" can do the following, it could easily be done in ~5 minutes:

 

Assumption:  The main server is called Tower;

                      the backup server is called TowerBack

                      (Obviously these can be whatever you choose to call them)

 

=>  Shut down the "broken" server.  Note:  This, of course, assumes there is some mechanism to KNOW when there's a broken server -- an e-mail notification; buzzer; etc.  There are several scripts floating around the forum to do these kind of things, based on drive temperature, SMART data, etc.    The shutdown can be as simple as pressing the power button if it's physically convenient.

 

=>  Load the Web GUI for the backup server;  click on "Settings", "Identification", and change the name of the server From TowerBack to Tower (i.e. delete 4 characters).

 

=>  Reboot the backup server.

 

Done  :)

 

Obviously if you do this, the reset procedure would require first renaming the backup server to its normal name BEFORE you rebooted the main server.

 

Link to comment

I would be happy enough with a simple way to turn the backup server into the live server in 15 minutes or less, and by a tech averse noob.

 

If the "tech averse noob" can do the following, it could easily be done in ~5 minutes:

 

Assumption:  The main server is called Tower;

                      the backup server is called TowerBack

                      (Obviously these can be whatever you choose to call them)

 

=>  Shut down the "broken" server.  Note:  This, of course, assumes there is some mechanism to KNOW when there's a broken server -- an e-mail notification; buzzer; etc.  There are several scripts floating around the forum to do these kind of things, based on drive temperature, SMART data, etc.    The shutdown can be as simple as pressing the power button if it's physically convenient.

 

=>  Load the Web GUI for the backup server;  click on "Settings", "Identification", and change the name of the server From TowerBack to Tower (i.e. delete 4 characters).

 

=>  Reboot the backup server.

 

I can confirm this seems to work, but you can have issues such as:

 

1. The network name is reluctant to load for few minutes either by //tower from Windows, or from the browser.

2. All your shares must be already duplicated on both servers.

3. The tech averse noob must be able to to the necessary things.

 

Items 2 and 3 are to be expected, but item 1 can cause some noobs to panic....

Link to comment

1. The network name is reluctant to load for few minutes either by //tower from Windows, or from the browser.

 

i think this is more to do with master browser elections, if you make unraid the master browser then this issue should go away:-

 

open webgui and go to settings tab then smb and change "local master" to "yes" obviously you would need to ensure this is set on your primary unraid server and your secondary unraid server.

 

 

Link to comment

Yes, the main server is set to be the master browser.  The backup server is not.  Wonder if I could just leave them both as master?  The backup is only running from 1am-3am daily then it shuts off.

 

Requiring a change of IP addresses would put us outside the "noob friendly" zone.

Link to comment

Requiring a change of IP addresses would put us outside the "noob friendly" zone.

 

Definitely agree.    As long as both servers use DHCP, and the backup server is rebooted after changing its ID (as I noted above), then all users should have no issue connecting as long as they're accessing the server with fully qualified path names.    I think that's about as far as you can go with the "noob friendly" requirement  :)

Link to comment

I guess it depends on what direction Tom is wanting to take this product. To me I view unraid as a great HOME media server solution. I don't see companies like Netflix or Hulu using unraid to store and serve their data, plus with the money they make features like mixed disks just doesn't mean as much to them as it does to the hobbyist.

 

I stream from my server all the time and with three kids I'm fairly confident in saying something is streaming almost all day long. That being said, High Availability wouldn't be a feature I'd pay extra for and honestly would rather Tom spend time on other features.

 

At the end of the day unraid is a media server solution that we happen to also use for other personal files and documents as well. None of my media is crucial that I be able to access 24/7 so in the event of a failure I could wait to fix things and get up and running again. Would be a minor nuisance and annoyance, but not crucial.

Link to comment

Definitely agree with Influencer's comments.    High availability is important for commercial concerns, but certainly nowhere nearly as much for a home environment ... and UnRAID is much more focused on that market than the commercial world.

 

I DO want "good" availability => for that I consider a UPS mandatory;  high-quality disk drives (NAS or Enterprise class drives);  and of course good backups.    But I certainly don't need fail-over clusters; etc.    A good backup server (as you've already got) is PLENTY.

 

Besides, Mr. Murphy (he of Murphy's Law fame) almost guarantees that if you have a good, solid, backup server available your system will never fail anyway  8)  8)    It's those that don't have backups that need to be concerned with Mr. Murphy's exploits !!

 

Link to comment

Besides, Mr. Murphy (he of Murphy's Law fame) almost guarantees that if you have a good, solid, backup server available your system will never fail anyway  8)  8)    It's those that don't have backups that need to be concerned with Mr. Murphy's exploits !!

 

Soooo what I hear you saying is, I only need to FOOL Mr. Murphy into THINKING that I have good, solid, backup??  Got it!!!  I'll build a fake server right away  8)

Link to comment

It depends what you really mean by high availability and what you want to happen in a failover scenario.

 

Give one of the main 'drawbacks' of unraid (not really a drawback - by design, but it seems to upset people a lot) is the write performance with respect to parity then HA would only slow this down as you would have to replicate the data. Presuming you want this to happen in real time then you'd have to wait for each operation to commit across both servers 'to be sure'.

 

Like wise network failover. This can be done real time with the same IP etc but do you want to only ever interact with one server at a time? or would you prefer your connections to be load balanced across both? Unraid could use ctdb to make this happen for samba and nfs though there are other considerations that would have to be addressed.

 

Generally in HA scenarios you also need 3 cluster members to provide a quorum and tie break in the even of a failure. If you only have two machines there are many failure states that can lead to both machines thinking they're the primary and acting accordingly. You then have split brain.

 

Lots to consider. For a home solution like unraid it would seem overkill and, in general, it's quite a tricky thing to get right full stop for a full enterprise style active / active instant failover. Though it is doable.

 

If you want the more basic warm spare option then that can be done now with ease :

 

- Build two servers

- Treat one as the primary

- Periodically (as often or as little as you like, likely dependant on churn rate) rsync the data from the primary to the warm spare.

- Though I'm sure you can rsync the unraid config over as well to a large degree you would really have to duplicate any management operations on the primary server to the warm spare to be sure I suspect.

- In event of failure simply change the ip on the warm spare, tell your clients to use the different hostname/ip or update a DNS record if you're running zones in-house and you're back up and running at the point of your last rsync catchup.

 

I think a lot of people on these forums do something like this anyway right now for backup purposes if nothing else.

 

After the going off the deepend consideartion of full on HA at the start if what you're really suggesting is that unraid could do with a built in mechanism to relay management commands to another server then that's probably much more doable. Add on a gui interface to do the rsync and it's probably not too heavyweight to be done, especially given the use of rsync already for the mover script.

 

I don't think any of it would ever be 'noob' friendly though. It's inherently something that is complex and a nice management layer sheen can only go so far to hide the complexities from you (especially if it ever has a problem or stops working..).

 

Link to comment

 

Lots to consider. For a home solution like unraid it would seem overkill and, in general, it's quite a tricky thing to get right full stop for a full enterprise style active / active instant failover. Though it is doable.

 

If you want the more basic warm spare option then that can be done now with ease :

 

After the going off the deepend consideartion of full on HA at the start if what you're really suggesting is that unraid could do with a built in mechanism to relay management commands to another server then that's probably much more doable. Add on a gui interface to do the rsync and it's probably not too heavyweight to be done, especially given the use of rsync already for the mover script.

 

I don't think any of it would ever be 'noob' friendly though.

 

Perhaps I should just settle for higher availability not the full HA.  If we could define a noob achievable way to transition to the backup server, we could stop right there.  I really would rather noobs didn't need access to the mgmt GUI.  There are things they could mess up there.  Garry Case proposed just using the mgmt GUI to rename the backup TowerBack to Tower then reboot. Its workable and relatively simple but dangerous to give noobs that much access.  Hmmm.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.