Recently I’ve been doing a lot of work on large deployments that require active/active or active/passive setups, whereby options to fail over to a DR site are either required as part of the design, or presented as future enhancement to the customer. Most of these have been fairly open questions – “How can we achieve this?” for example. It’s a question that is almost completely subjective; it depends entirely on business needs, and what the available budget is.
Subjective elements aside, it is a much debated technical area, so I opened up a question on the MyCUGC forums to ask the community how they were going about this. I also tweeted the question out @jakewalsh90:
I based my question around the concept that is most common (certainly to me at least) – an active/active or active/passive design, with a primary site and a secondary (DR/Backup) site. This is without a doubt the most common environment type that I encounter, predominantly in small and medium enterprises up to around 5000 users.
The main purpose of this post is to summarize the elements (both technical and strategic) that could be considered, and the different options we can lean on to help achieve the desired results. And also, to highlight just how good the response from the Citrix Community was on this question!
By far the most common point that came out of the discussion around this was – “it depends”. There are a great number of factors to consider for any solution like this, including:
- Budget – what is affordable and achievable with our budget?
- Connectivity – are we limited by latency/bandwidth/other traffic etc? Are we using Dark Fiber, MPLS, VPN etc?
- DC Locations – if we are planning for a Secondary/DR site, is it likely this would ever be affected by an issue that took down our primary site? (Hurricanes, Floods, Earthquakes etc.)
- Capacity – is this a full DR/Secondary solution or just a subset of applications and users?
- Hardware – do we have the hardware to achieve this? Is it within our budget?
- Software – can we do this within our current licensing or do we need an uplift?
- Applications – are we replicating everything or just key applications? How will these applications perform in another DC? (Applications may have web/database dependencies based only in a single site).
- User Data – are we replicating user data too? How are profiles going to be handled?
- Failover method – are we utilizing a Citrix solution for this, or perhaps a product like VMware Site Recovery Manager? How is failover undertaken – automatic? manual?
Aside from the many other factors affecting a question like this, our discussion focused on the Citrix technical elements aimed at DR/Failover options available. I’ve highlighted the key points we discussed, and gathered a number of resources that I think are helpful in discovering these further:
GSLB via NetScaler for StoreFront (Access Layer) – this was a common theme throughout the discussions, and there seems to be a general consensus that utilising GSLB on NetScaler is a logical way forward. Creating an access layer that utilizes NetScaler GSLB and StoreFront, whilst spanning the DC’s, will give a solution that is resilient and reliable, and won’t require complex replication/management. Dave Brett has written an excellent article on setting this up.
XenDesktop Site with Zones – Zones in XenDesktop are an awesome way to split geographically (or logically) separate resources, whilst maintaining the ease of management and reduced overhead of only having a single farm. Utilizing Zoning to form an active/active or active/passive solution is simple in configuration terms too. With Zones users can be automatically redirected to a secondary zone VDA during the failure of a their primary zone VDA.
Local Host Cache – as I am sure you are aware, Local Host Cache is now back in XenDesktop, and provides additional tolerance for database outages. LHC allows connection brokering operations to take place when the following issues occur:
The connection between a Delivery Controller and the Site database fails in an on-premises Citrix environment.
The WAN link between the Site and the Citrix control plane fails in a Citrix Cloud environment.
See https://docs.citrix.com/en-us/xenapp-and-xendesktop/7-12/manage-deployment/local-host-cache.html for further details on LHC.
You can check to see if LHC is on by running the following PowerShell: Get-BrokerSite. I’m running 7.15 in my lab so it is enabled by default:
SQL Options – SQL is a key component of the FMA architecture – so any solution (with or without DR/Failover) needs a reliable solution for hosting Site Databases. Usually my go to solution is to mirror any databases using SQL Mirroring. AlwaysON Failover Clustering, and Always On AvailabilityGroups are both possible solutions – particularly given that Database Mirroring is being deprecated.
When DR is considered this opens up additional hardware and software requirements to provide suitable hardware and SQL Server licensing.
See page 101-102 of the Updated Citrix VDI Handbook for further information on SQL redundancy and replication options: http://docs.citrix.com/content/dam/docs/en-us/xenapp-xendesktop/7-15-ltsr/downloads/Citrix%20VDI%20Handbook%207.15%20LTSR.pdf
Using StoreFront to handle the Failover (Site/Delivery Controller Level) – From StoreFront 3.6 it has been possible to Load Balance Resources across controllers, allowing StoreFront to effectively handle failover between XenDesktop Farms. (See https://www.citrix.com/blogs/2016/09/07/storefront-multi-site-settings-part-2/ for more details on this)
This method allows us to have two XenDesktop Farms – and to publish identical resources which are then load balanced by the StoreFront server. Failover would only occur in the event that a Delivery Controller was unavailable in the primary site. This solution would still allow for a GSLB approach with StoreFront and NetScaler too.
The main disadvantage of this approach is the increased management overhead of the additional XenDesktop Farm, but this can be managed by having good practices in place.
This is configured in the Delivery Controller section of a StoreFront site – and requires both farms to publish the resources required for failover. See below – two Farms configured in the Delivery Controller section within a StoreFront site:
We also need to configure the “User Mapping and Multi-Site Aggregation Configuration”. Note that below I have configured all Delivery Controllers to for “Everyone” – but this may need to be adjusted in a production environment:
You will also need to configure resource aggregation as below. For failover, do not tick “Load Balance resources across controllers”. However, “Controllers publish identical resources” will need to be ticked so that identically named published applications or desktops are de-duplicated:
With this set, any resources published in both farms will be launched from the Secondary Site in the event that the Delivery Controllers in the first site fail to respond.
Application Level Failover using Application Group Priorities – it is also possible to use application groups with priorities to control the failover of applications. When you configure an application group in XenDesktop 7.9+ you are able to configure this:
Gareth Carson has a great blog post on this which explains the functionality in more detail.
Hopefully this post has been helpful in highlighting some of the considerations for a DR/Second Site scenario. And also, has helped to highlight some of the Citrix technologies and great community resources out there to help make the process a little easier. It’s been useful for me to ask the question and compile a post like this because I’ve had to look into the various technologies and find out more about them in my own lab before writing this… until next time, cheers!