Tags: Failover

Azure Traffic Manager for NetScaler Gateway Failover

Azure Traffic Manager is designed to provide traffic routing to various locations based on a ruleset that you specify. It can be used for priority (failover), weighted distribution, performance, and geographic traffic distribution.

The failover option is similar to GSLB – and works in a similar way, so I am going to demonstrate that in this post. I’ve started with the following environment already configured:

  • Two Azure Locations (East US and South Central US), with a VPN between the sites to join the VNETs
  • 1 Domain Controller in each location
  • 1 NetScaler (standalone) in each location
  • 1 Citrix Environment spread across the two locations
  • NetScaler Gateway’s setup in both sites, and NAT’d out using Azure Load Balancer. (So that we have a public IP offering NetScaler Gateway services in both Azure Locations. Have a look at this Blog post if you require guidance on setting this up.)

Azure Lab Diagram

Before we setup the Azure Traffic Manager profile, we need to give our Public IP Addresses a DNS name label. To do this, browse to the Public IPs for your Load Balancers, and then click on “Configuration”. We need to give our Public IP addresses a DNS name label, as this is what Traffic Manager will be using to balance the endpoints.

Azure Public IP configuration

I have two public IPs so I have created two DNS Name Labels and given them appropriate names:

  • desktop-eus-jwnetworks = East US NetScaler Public IP
  • desktop-scus-jwnetworks = South Central US NetScaler Public IP

Next – it’s time to create the Azure Traffic Manager profile!

Traffic Manager profile creation

After we click create, we just need to populate a few basic details:

Traffic Manager profile creation

As you can see – I have given my Traffic Manager a name, selected Priority as the routing method (this gives us the failover in a similar manner to Active/Passive GSLB). Note: there are other options available:

Traffic Manager routing options

See here for an overview of the Traffic Routing Methods. Next – we need to configure some more settings on our Traffic Manager, to ensure that the Monitoring and Traffic Routing are going to work correctly. In the screenshot below I have adjusted the following:

  • DNS TTL – I’ve adjusted this to 60 seconds, this defaults to 300 seconds (5 minutes)
  • Protocol – HTTPS, this is because we are Monitoring the HTTPS NetScaler Gateway
  • Port – 443 as we are using this port for the NetScaler Gateway
  • Path – this is the path to the files that the monitor will be checking for, so in the case of NetScaler Gateway this is /vpn/index.html – if this page is not available then the service will be marked as unavailable.
  • Probing Interval – this is how often the endpoint health is checked. Values are either every 10 seconds or every 30 seconds
  • Tolerated number of failures – this is how many health check failures are tolerated before the endpoint is marked as unhealthy
  • Monitoring timeout – this is the time the monitor will wait before considering the endpoint as unavailable.

For more information on these configuration options – click here.

Traffic Manager configuration

Next – it is time to add our endpoints! To do this, click on Endpoints and then on Add:

Traffic Manager endpoints

We then need to add our Public IP addresses assigned to the Azure Load Balancers (where the NAT rules were created). Note – you will need to do this for BOTH endpoints:

Adding Endpoints to Traffic Manager

Once both are added, you will see the below in the Endpoints screen. Note that both Endpoints are shown as “Online” – this confirms our monitor is detecting the Endpoints as up. Also note that the Endpoints have priority – this means that under normal operation, all traffic will be sent to the “eus-desktop” endpoint (Priority 1), and in the event of a failure of the “eus-desktop” endpoint, all traffic will be directed to the “scus-desktop” Endpoint.

Endpoints

All that is left to do is test – however, first let’s make things neat for our users with a CNAME DNS Record. We are effectively going to CNAME our jwdesktop.trafficmanager.net record to something that users would be able to remember. You can find your record from the overview screen:

Traffic Manager overview

Next up I added a CNAME record in my Azure DNS Zone:

Add DNS Record

Add CNAME Record

Once this is created – we can start testing! But first, a diagram! Below is shown what we now have setup and working:

Solution Diagram

Note: in order to easily distinguish between my two Gateways, I set the EUS Gateway to the X1 theme, and the SCUS Gateway was left on the default NetScaler theme. When accessing https://desktop.jwnetworks.co.uk I am correctly shown the EUS Gateway:

EUS Gateway Test

Bingo – this all looks good to me! Next up, I disabled the Virtual Server for the EUS NetScaler:

Virtual Server Disabled

After around 30 seconds… the Monitor Status shows as Degraded:

[ for those interested in the maths (10 Second Probe interval + 5 second timeout)x1 tolerated failure (so effectively 15×2 attempts at connecting) ]

Endpoint Health

Next I refreshed the Page and we are presented with the SCUS Gateway page:

SCUS Gateway Page

As you can see, during a failure condition (the EUS Gateway vServer being taken down) the Traffic Manager directs traffic to our Priority 2 site, without any intervention from us. Any users would be able to refresh the page and then log back in. This can be used not only for NetScaler Gateway but for many internet facing services – for example OWA, SharePoint etc. There’s a great many services that can benefit from this type of failover and the resiliency that it offers.

XenDesktop Site Failover – asking the community…

Recently I’ve been doing a lot of work on large deployments that require active/active or active/passive setups, whereby options to fail over to a DR site are either required as part of the design, or presented as future enhancement to the customer. Most of these have been fairly open questions – “How can we achieve this?” for example. It’s a question that is almost completely subjective; it depends entirely on business needs, and what the available budget is.

Subjective elements aside, it is a much debated technical area, so I opened up a question on the MyCUGC forums to ask the community how they were going about this. I also tweeted the question out @jakewalsh90:

I based my question around the concept that is most common (certainly to me at least) – an active/active or active/passive design, with a primary site and a secondary (DR/Backup) site. This is without a doubt the most common environment type that I encounter, predominantly in small and medium enterprises up to around 5000 users.

The main purpose of this post is to summarize the elements (both technical and strategic) that could be considered, and the different options we can lean on to help achieve the desired results. And also, to highlight just how good the response from the Citrix Community was on this question!

Key Considerations

By far the most common point that came out of the discussion around this was – “it depends”. There are a great number of factors to consider for any solution like this, including:

  • Budget – what is affordable and achievable with our budget?
  • Connectivity – are we limited by latency/bandwidth/other traffic etc? Are we using Dark Fiber, MPLS, VPN etc?
  • DC Locations – if we are planning for a Secondary/DR site, is it likely this would ever be affected by an issue that took down our primary site? (Hurricanes, Floods, Earthquakes etc.)
  • Capacity – is this a full DR/Secondary solution or just a subset of applications and users?
  • Hardware – do we have the hardware to achieve this? Is it within our budget?
  • Software – can we do this within our current licensing or do we need an uplift?
  • Applications – are we replicating everything or just key applications? How will these applications perform in another DC? (Applications may have web/database dependencies based only in a single site).
  • User Data – are we replicating user data too? How are profiles going to be handled?
  • Failover method – are we utilizing a Citrix solution for this, or perhaps a product like VMware Site Recovery Manager? How is failover undertaken – automatic? manual?

Citrix Considerations

Aside from the many other factors affecting a question like this, our discussion focused on the Citrix technical elements aimed at DR/Failover options available. I’ve highlighted the key points we discussed, and gathered a number of resources that I think are helpful in discovering these further:

 

GSLB via NetScaler for StoreFront (Access Layer) – this was a common theme throughout the discussions, and there seems to be a general consensus that utilising GSLB on NetScaler is a logical way forward. Creating an access layer that utilizes NetScaler GSLB and StoreFront, whilst spanning the DC’s, will give a solution that is resilient and reliable, and won’t require complex replication/management. Dave Brett has written an excellent article on setting this up.

 

 XenDesktop Site with ZonesZones in XenDesktop are an awesome way to split geographically (or logically) separate resources, whilst maintaining the ease of management and reduced overhead of only having a single farm. Utilizing Zoning to form an active/active or active/passive solution is simple in configuration terms too. With Zones users can be automatically redirected to a secondary zone VDA during the failure of a their primary zone VDA.

 

Local Host Cache – as I am sure you are aware, Local Host Cache is now back in XenDesktop, and provides additional tolerance for database outages. LHC allows connection brokering operations to take place when the following issues occur:

The connection between a Delivery Controller and the Site database fails in an on-premises Citrix environment.

The WAN link between the Site and the Citrix control plane fails in a Citrix Cloud environment.

See https://docs.citrix.com/en-us/xenapp-and-xendesktop/7-12/manage-deployment/local-host-cache.html for further details on LHC.

You can check to see if LHC is on by running the following PowerShell: Get-BrokerSite. I’m running 7.15 in my lab so it is enabled by default:

 

SQL Options – SQL is a key component of the FMA architecture – so any solution (with or without DR/Failover) needs a reliable solution for hosting Site Databases. Usually my go to solution is to mirror any databases using SQL Mirroring. AlwaysON Failover Clustering, and Always On AvailabilityGroups are both possible solutions – particularly given that Database Mirroring is being deprecated.

When DR is considered this opens up additional hardware and software requirements to provide suitable hardware and SQL Server licensing.

See page 101-102 of the Updated Citrix VDI Handbook for further information on SQL redundancy and replication options: http://docs.citrix.com/content/dam/docs/en-us/xenapp-xendesktop/7-15-ltsr/downloads/Citrix%20VDI%20Handbook%207.15%20LTSR.pdf

 

Using StoreFront to handle the Failover (Site/Delivery Controller Level) – From StoreFront 3.6 it has been possible to Load Balance Resources across controllers, allowing StoreFront to effectively handle failover between XenDesktop Farms. (See https://www.citrix.com/blogs/2016/09/07/storefront-multi-site-settings-part-2/ for more details on this)

This method allows us to have two XenDesktop Farms – and to publish identical resources which are then load balanced by the StoreFront server. Failover would only occur in the event that a Delivery Controller was unavailable in the primary site. This solution would still allow for a GSLB approach with StoreFront and NetScaler too.

The main disadvantage of this approach is the increased management overhead of the additional XenDesktop Farm, but this can be managed by having good practices in place.

This is configured in the Delivery Controller section of a StoreFront site – and requires both farms to publish the resources required for failover. See below – two Farms configured in the Delivery Controller section within a StoreFront site:

We also need to configure the “User Mapping and Multi-Site Aggregation Configuration”. Note that below I have configured all Delivery Controllers to for “Everyone” – but this may need to be adjusted in a production environment:

You will also need to configure resource aggregation as below. For failover, do not tick “Load Balance resources across controllers”. However, “Controllers publish identical resources” will need to be ticked so that identically named published applications or desktops are de-duplicated:

With this set, any resources published in both farms will be launched from the Secondary Site in the event that the Delivery Controllers in the first site fail to respond.

 

Application Level Failover using Application Group Priorities – it is also possible to use application groups with priorities to control the failover of applications. When you configure an application group in XenDesktop 7.9+ you are able to configure this:

Gareth Carson has a great blog post on this which explains the functionality in more detail.

In Conclusion…

Hopefully this post has been helpful in highlighting some of the considerations for a DR/Second Site scenario. And also, has helped to highlight some of the Citrix technologies and great community resources out there to help make the process a little easier. It’s been useful for me to ask the question and compile a post like this because I’ve had to look into the various technologies and find out more about them in my own lab before writing this… until next time, cheers!

 

 

 

XenDesktop 7.11 – Zones

Introduction

Many XenApp administrators will remember the Zone Preference Failover (ZPF) features available in previous versions. For those unaware, this allows resources to be served from Hosts within a logical group (a Zone), unless a failure/maintenance condition exists, in which case resources are served from a Secondary Zone.

Using Zones we can provide applications from a single Zone, with failover to another Zone in the event of an issue. This is useful for a great many reasons:

  • Automatic failover to a DR site if there are issues with the Primary Zone’s Session Hosts
  • Automatic placement of users into a Zone that is nearest their location – useful for those with geographically disparate environments.
  • Capacity management – we can seamlessly “spill” users across to a secondary zone in the event that the primary zone reaches capacity.

zones00

Zones are configured using the Zones option, under Configuration, within Citrix Studio:

zone03

Lab Environment

For the purposes of this demonstration, I have considered a fictional environment shown below. We have two Geographical sites, linked via a wide area network. We have Active Directory Services across both sites, and a replicated SQL Environment. XenDesktop has been configured as a single site, with two Delivery Controllers, and two Session Hosts.

zone02

Goals

For this demonstration I will show how user preferences can be assigned to resources in a zone, and then seamlessly failed over in the event of a fault or issue.

zone01

Creation of Zones

We begin with two Delivery Controllers configured for this farm – obviously in a production environment we would be working with N+1, so at least two Controllers per Zone.

zones04

We can then create the Zone Configuration – this is done using the Zones pane in Citrix Studio:

zone03

As you can see, the default configuration places all controllers in the Primary Zone (the only zone within the Farm):

zones05

To begin, I will create a secondary zone called “Zone 2” and also rename “Primary” to “Zone 1” – in a production environment these could be physical locations, or different Data Centers. To create a new Zone, click on the “Create Zone” within the Actions pane:

zones06

You will then be presented with a new window which can be name as required. Note – I have selected my Zone 2 Delivery Controller (Z2XD01) to be added to this Zone:

zones07

When the details have been populated and Controllers selected, click on “Save”. The new Zone is then created:

zones08

After this I updated the Primary zone by right clicking on the Zone and selecting “Edit Zone”:

zones09

My Zone configuration is now as follows:

zones10

Machine Catalogue Setup

Next I will create two new machine catalogues, each with a single session host, and assign these machine catalogues to Zone 1 and 2. The creation of these is done as you would any other machine catalogue – except we can now select a zone to add the machines to:

zones11

I’ve created two catalogues now – one for Zone 1 and one for Zone 2:

zones12

Now we will create a single Delivery Group – in this scenario, the Delivery Group spans across the Zones, and contains machines from both Zones:

zones13

Note: Create the Delivery Group with machines from one Zone first, and then add in the machines from the 2nd Machine Catalogue after.

Zone Configuration

Before we can assign users to a Zone, we need two new AD Groups to define the users home Zone:

zones14

Next – we can assign these Groups to Zones within the Console. We do this with the “Add Users to Zone” option:

zones15

Adding a user Group to a Zone:

zones16

Now – we can see the user configuration for each Zone:

zones17

zones18

Testing

To test this setup I created 5 Test User accounts, and added them all to the Users_Zone_1 AD Group. This will mean that when these users launch a session from our Delivery Group, it will be served by Session Hosts in Zone 1 (as we have associated these users to that Zone). I logged all of the users into StoreFront and launched a session – the result was all users launched desktops from Zone 1 Resources:

zones19

Next – I will remove Test User 5 from Zone 1, and place them into a Zone 2 Group. Then I will log them off and back on. The result is shown below – note how a change in the Zone Assignment changes the Session Host used:

zones20

Practical Use – Failover!

Having users log into different Session Hosts based on an AD Group is all well and good – but this is not unique to Zone Configuration. What’s great about the Zoning in XenDesktop is that we can automatically fail over between the zones. In the example below, all 5 Users are assigned, by AD Group, to Zone 1.

But – if there’s an issue with the Session Hosts in Zone 1, perhaps a failure or outage, we don’t want to manually have to fail this over. When setting up the Delivery Group, this option was not selected:

zones21

This means that because both Machine Catalogues (in each Zone) are members of the same Delivery Group (which spans the Zone), we can have user sessions automatically launch in Secondary Zone for the Users. To test this, I placed all Machines in Zone 1 into Maintenance Mode, and then logged on all 5 Test Users. The result – Users are logged onto Machines in Zone 2:

zones22

Essentially, we now have a solution that automatically connects users into their primary resource, based on Zone Assignment, but in the event of an issue with that Zone, connects them into resources in a Secondary Zone.

Obviously there many other elements to consider with a design like this – particularly around the SQL, StoreFront, and NetScaler infrastructure. But for a simple and automatic failover solution – this works really well.