Azure VM Scale Sets and Remote Desktop Services?

When using any environment that provides virtual desktops at scale, it makes sense to have only the required number of resources running at the right time – rather than all of the resources all of the time. The usual approach to this is to use power management – so unused virtual machines are shut down when not in use.

With Azure we have another potential option designed for large workloads – to use Virtual Machine Scale Sets. This allows us to automatically scale up and down the number of Virtual Machines based on various factors and choices. This effectively allows us to ensure the most economical use of resources – as we never pay for more than we need to use, because the machines are de-allocated when not required. Scale Sets also provide a number of features around image management and VM sizing that could be useful for VDI environments.

In this post I am going to explore the validity and feasibility of VM Scale Sets for a Remote Desktop Services Environment. To start this post – I have the following environment configured, minus the scale set:

Note: if you need an RDS environment – this Azure template is awesome: https://azure.microsoft.com/en-gb/resources/templates/rds-deployment/ – I would advise using multiple infrastructure VMs for each role if this is a production service though.

Next – I configured a single server with the RDS Session Host role and all of the applications I require, as this will become our VM image. I then ran sysprep /generalize as per the Microsoft instructions for Image Capture in Azure. (See here). Once this is done we need to stop and de-allocate the VM, and then we need to turn this into an image we can use with a scale set:

Once this is done – we have a VM image saved:

So once we have an image – we can create Virtual Machines from this image, and create a Scale Set that will function as the means to scale up and down the environment. However – we need to do some more work first, as if we just scale up and down with a sysprepped VM, we end up with a VM off domain that won’t be of any use to us…. !

Usually – I just spin up Lab VMs using a JSON Template that creates the VM and joins it to an existing lab domain, using the JoinDomain extension. This saves me lots of time and gives me VMs deployed with minimal input (just a VM name is all I have to enter):

See https://github.com/Azure/azure-quickstart-templates/tree/master/201-vm-domain-join for more details and to use this template.

Now that we have a template – we are ready to go. I’m using Visual Studio to create the JSON for my deployment – and fortunately there is a built in scale set template we can use and modify for this purpose:

With the template up and running, we just need to add some parameters – and we can run a basic test deployment to confirm everything is working. My parameters for the basic template are shown below:

A quick test deployment confirms we are up and running:

However, there are a few issues with the template we need to correct – namely:

  • The machines are not joined to the Domain – and we need to place them into the correct OU for GPO settings too
  • A new VNET is created – we need to either use peering (prior to creation – or domain join operations will fail), or better an existing VNET already setup
  • The load balancer created is not required – we’ll be using the RDS Broker anyway

For this test – all I am concerned about is the domain join and VNET. The load balancer won’t be used so I can just discard this – however, the VNET and Domain Join issues will need to be resolved!

Issue 1 – using an existing VNET

To fix this, I am not going to reinvent the wheel – we just need some minor adjustment to the JSON file, based on this Azure docs article – https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-mvss-existing-vnet. In short, this will achieve the following:

  1. Add a subnet ID parameter, and include this in the variables section as well as the parameters.json
  2. Remove the Virtual Network resource (because our existing VNET is already in place)
  3. Remove the dependsOn from the Scale Set (because the VNET is already created)
  4. Change the Network Interfaces of the VMs in the scale set to use the defined subnet in the existing VNET

Issue 2 – joining the Scale Set VMs to an AD Domain

To get the VMs in the scale set joined to an AD Domain we need to make use of JsonADDomainExtension.

With this added to the JSON template for our deployment, we just need to add the variables and parameters (shown below) and then we are good to go:

Note: the first time I used this I had an issue with the Domain Join – it was caused by specifying only the domain admin username. When specified in the form above (domain\\adminusername) it then worked fine.

Now when we run the template, we get the usual Visual Studio output confirming success – but also a scale set, and, machines joined to the domain:

Because I have previously configured the image used in the Scale Set with the RDS Role, and the Software required – we just need the servers to use an RDS Broker that will manage inbound connections into the RDS Farm. This is where I encounter the first sticking point – these need to be added manually when the Session Collection is created 🙁

This wasn’t a massive issue for this test – so I went ahead and created a Session Collection and added in my VMs:

Next I tested the solution by launching a Desktop via Remote Desktop Web Access:

Bingo – I was then logged into an RDS Session. Note the RDS Connection Name (showing the Broker) and the Computer Name (showing the Session host). This confirms we are running as expected:

I’ve now demonstrated the RDS Farm up and running, utilizing machines created by a Scale Set, and also accessed via a connection broker. But – we aren’t quite done yet, as we have not looked how a scale set could enhance this solution. Below are a few ways we can improve the environment using Scale Sets, and a few limitations when used with RDS:

  • We have the option to Manually increase VM instances if we need more Session Hosts:

Note: this will require adding to the RDS Session collection manually (or via PowerShell)

  • We can scale the environment automatically using Auto Scale:

Below you can see a default scale rule (5 VMs in the Scale set) and then a rule that runs between 0600 and 1800 daily, and increases the VM Count up to 10 VMs if average CPU usage goes above 80%.

The rule for this Scale operation is shown below:

Note: this will still require machines adding to the Session Collection manually.

  • We can increase the size of the VMs

Once a new size has been selected – the existing VMs show as not up to date:

We would then need to upgrade the VMs in the scale set (requiring a reboot), but, does not require the VMs to be re-added to the Session Collection. With this option a drain, upgrade, drain, upgrade option would be available. This allows for a sizing upscale – without lots of reconfiguration or management required.

Overall, it would seem that although scale sets aren’t able to fully integrate with Remote Desktop Services collections, they are still very capable and powerful when it comes to managing RDS Workloads. Scale Sets can be used to size and provision machines, as well as to provide simple options to increase environment capacity and power. Purely using a scale set for the ability to spin up new VMs, or to manage sizing across multiple VMs is a logical step. We also have the option to reimage a VM – taking it back to a clean configuration.

Key Observations from my investigation:

  • We can scale an RDS environment very quickly, but RDS Servers can’t be automatically added to a session collection – the GPO settings for this don’t appear to support RDS post 2008R2 (whereby Session Collections and the new configuration method was introduced). This means servers have to be manually added when the Scale Set is scaled up
  • Scale sets can be used to increase VM size quickly – without reimaging servers (a reboot is all that is required)
  • Scaling can only look at performance metrics – we can’t scale on user count for example
  • Reimaging means we can take servers back to a clean build quickly – if a server has an issue we would just prevent logons and then reimage.
  • Scaling down can’t take logged on users into consideration – so we’d need a way of draining servers down first
  • Scale Sets will also allow us to scale up to very large environments with minimal effort – just increase VM count or size, and add the servers into the RDS Collection. A growing business for example – or one that provides a hosted desktop could scale from 10 servers to a few hundred with minimal effort.

Hope this helps, and congratulations if you have made it to the end of this article! Until next time!

Resources:

Azure Traffic Manager for NetScaler Gateway Failover

Azure Traffic Manager is designed to provide traffic routing to various locations based on a ruleset that you specify. It can be used for priority (failover), weighted distribution, performance, and geographic traffic distribution.

The failover option is similar to GSLB – and works in a similar way, so I am going to demonstrate that in this post. I’ve started with the following environment already configured:

  • Two Azure Locations (East US and South Central US), with a VPN between the sites to join the VNETs
  • 1 Domain Controller in each location
  • 1 NetScaler (standalone) in each location
  • 1 Citrix Environment spread across the two locations
  • NetScaler Gateway’s setup in both sites, and NAT’d out using Azure Load Balancer. (So that we have a public IP offering NetScaler Gateway services in both Azure Locations. Have a look at this Blog post if you require guidance on setting this up.)

Azure Lab Diagram

Before we setup the Azure Traffic Manager profile, we need to give our Public IP Addresses a DNS name label. To do this, browse to the Public IPs for your Load Balancers, and then click on “Configuration”. We need to give our Public IP addresses a DNS name label, as this is what Traffic Manager will be using to balance the endpoints.

Azure Public IP configuration

I have two public IPs so I have created two DNS Name Labels and given them appropriate names:

  • desktop-eus-jwnetworks = East US NetScaler Public IP
  • desktop-scus-jwnetworks = South Central US NetScaler Public IP

Next – it’s time to create the Azure Traffic Manager profile!

Traffic Manager profile creation

After we click create, we just need to populate a few basic details:

Traffic Manager profile creation

As you can see – I have given my Traffic Manager a name, selected Priority as the routing method (this gives us the failover in a similar manner to Active/Passive GSLB). Note: there are other options available:

Traffic Manager routing options

See here for an overview of the Traffic Routing Methods. Next – we need to configure some more settings on our Traffic Manager, to ensure that the Monitoring and Traffic Routing are going to work correctly. In the screenshot below I have adjusted the following:

  • DNS TTL – I’ve adjusted this to 60 seconds, this defaults to 300 seconds (5 minutes)
  • Protocol – HTTPS, this is because we are Monitoring the HTTPS NetScaler Gateway
  • Port – 443 as we are using this port for the NetScaler Gateway
  • Path – this is the path to the files that the monitor will be checking for, so in the case of NetScaler Gateway this is /vpn/index.html – if this page is not available then the service will be marked as unavailable.
  • Probing Interval – this is how often the endpoint health is checked. Values are either every 10 seconds or every 30 seconds
  • Tolerated number of failures – this is how many health check failures are tolerated before the endpoint is marked as unhealthy
  • Monitoring timeout – this is the time the monitor will wait before considering the endpoint as unavailable.

For more information on these configuration options – click here.

Traffic Manager configuration

Next – it is time to add our endpoints! To do this, click on Endpoints and then on Add:

Traffic Manager endpoints

We then need to add our Public IP addresses assigned to the Azure Load Balancers (where the NAT rules were created). Note – you will need to do this for BOTH endpoints:

Adding Endpoints to Traffic Manager

Once both are added, you will see the below in the Endpoints screen. Note that both Endpoints are shown as “Online” – this confirms our monitor is detecting the Endpoints as up. Also note that the Endpoints have priority – this means that under normal operation, all traffic will be sent to the “eus-desktop” endpoint (Priority 1), and in the event of a failure of the “eus-desktop” endpoint, all traffic will be directed to the “scus-desktop” Endpoint.

Endpoints

All that is left to do is test – however, first let’s make things neat for our users with a CNAME DNS Record. We are effectively going to CNAME our jwdesktop.trafficmanager.net record to something that users would be able to remember. You can find your record from the overview screen:

Traffic Manager overview

Next up I added a CNAME record in my Azure DNS Zone:

Add DNS Record

Add CNAME Record

Once this is created – we can start testing! But first, a diagram! Below is shown what we now have setup and working:

Solution Diagram

Note: in order to easily distinguish between my two Gateways, I set the EUS Gateway to the X1 theme, and the SCUS Gateway was left on the default NetScaler theme. When accessing https://desktop.jwnetworks.co.uk I am correctly shown the EUS Gateway:

EUS Gateway Test

Bingo – this all looks good to me! Next up, I disabled the Virtual Server for the EUS NetScaler:

Virtual Server Disabled

After around 30 seconds… the Monitor Status shows as Degraded:

[ for those interested in the maths (10 Second Probe interval + 5 second timeout)x1 tolerated failure (so effectively 15×2 attempts at connecting) ]

Endpoint Health

Next I refreshed the Page and we are presented with the SCUS Gateway page:

SCUS Gateway Page

As you can see, during a failure condition (the EUS Gateway vServer being taken down) the Traffic Manager directs traffic to our Priority 2 site, without any intervention from us. Any users would be able to refresh the page and then log back in. This can be used not only for NetScaler Gateway but for many internet facing services – for example OWA, SharePoint etc. There’s a great many services that can benefit from this type of failover and the resiliency that it offers.

Load Balancing Citrix StoreFront with Azure Load Balancer

Sometimes there is a requirement to Load Balance StoreFront using a method other than NetScaler. Although rare (in my experience!) this does occasionally happen when NetScaler is perhaps not being used for Remote Access –  in an internal only environment for example.

In this post I will explain how to Load Balance StoreFront using the native Azure Load Balancers. We start with a simple setup:

  • 1x Domain Controller
  • 2x Citrix StoreFront Servers – in an availability set called “EUS-StoreFront”
  • 1x Virtual Network (VNET)

All of the above is in the East US Azure Location.

We start by creating a new Azure Load Balancer. Note a few key settings here:

  • Type: Internal – this is because we are balancing traffic within our VNET (Internal Network only)
  • IP address – static… we don’t want the LB IP to change!

Once this is done – we can add the backend servers. We do this by targeting the Availability Set that the StoreFront Servers are in. For those familiar with NetScaler, this is similar to a Service Group:

Next – we need to configure some Health Probes. This allows us to determine the state of the StoreFront server and to confirm that the services we are load balancing are healthy and available. Note: at the current time Azure Load Balancer HTTP checks support relative paths only, so I have used /Citrix/CitrixWeb/monitor.txt – a simple text file (Static Content) I created to check that the Web Server is serving out content and thus working correctly. (https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/load-balancer/load-balancer-custom-probe-overview.md) I have configured by Health Probe as below:

Next – it’s time to create the Load Balancing Rule that will form the entry point for Load Balanced traffic. Note the Protocol (TCP), Ports (80 Frontend, and 80 Backend), Backend Pool (StoreFront Availability Set), Health Probe (our HTTP 80 monitor.txt check), Session Persistence (Client IP), and Idle Timeout (30 minutes is currently the maximum value):

We can then click OK and our Load Balancing Rule is created! Next I created a DNS A Record for StoreFront and pointed it at the Load Balancer IP. After this, I opened up a browser and typed in my newly created StoreFront DNS record. Bingo – we have a page!

To test that the Load Balancing was working. I shut down IIS on each server in turn, and then tested. Sure enough – even when only 1 out of 2 servers was running, the page stayed up and StoreFront was accessible.

This Load Balancer can be used for a variety of Web Applications, and is a simple way to Load Balance Azure based services as you require. Until next time… cheers!

GSLB for NetScaler Gateway across Azure Locations

In this post I’ll be going through how I have configured GSLB for NetScaler Gateway in Azure, and the various elements required for this type of configuration.

Firstly – I began by setting up the background infrastructure to demonstrate this test. Namely, 2x Active Directory DCs within two Azure Locations (Eastern US (EUS), and South Central US (SCUS)). These were on a Virtual Network setup for each location, joined with a Site to Site VPN utilizing Virtual Network Gateways. Next, I created a simple Citrix Environment that spanned the sites, ensuring I had resources in both sites – to properly demonstrate the failover.  An overview of the background infrastructure is shown below:

Next, I spun up two NetScaler BYOL versions:

These will form the bulk of the work where configuration will take place – in providing NetScaler Gateway via GSLB. I’m using Platinum Licensing, but Enterprise would also be fine (so that GSLB is available).

My NetScalers will all have multiple IP addresses assigned, for the SNIP, GSLB Site, and Gateway VIP:

Well worth a read at this point is CTP Gareth Carson’s awesome blog post around NetScaler deployment in Azure.

The next step for me was to setup 2x NetScaler Gateway vServers, which will be used for external access, and will be the vServers that will be provided by GSLB:

East US NetScaler:

South Central US NetScaler:

Next – I setup Authoritative DNS Listeners on both NetScalers, making use of the Subnet IP for this service (East US NetScaler shown below) :

So – next we can setup the GSLB Sites, to enable the synchronisation of GSLB information via Metric Exchange Protocol. It’s just a case of adding one local and one remote site on each NetScaler, and using the GSLB Site IPs. Once this is completed, the NetScalers will show as follows:

East US:

South Central US:

Once these are setup and showing as Active – we have communication between the NetScalers. This is carried out using Metric Exchange Protocol (MEP). Next – we need to configure GSLB, but first I am going to create the Public IP addresses for each site, as this makes the GSLB Setup easier! We will need two IP Addresses (one for each site), and TCP 443 (Gateway), and UDP 53 (ADNS) will need to be NAT’d through for this configuration to work. See the diagram below for an overview of this:

This is easy to configure – firstly we setup two Load Balancers, each with a Public IP (Static) assigned:

Ensure that Static is selected for the Public IP – otherwise this may change and then the solution will stop working:

Once this is created for both sites – we will have two Load Balancers ready to use for our NAT requirements:

At this point, I like to update the Network Security Groups on the NetScalers to ensure that the required inbound rules are in place. For both NetScalers we need to allow HTTPS and DNS inbound from anywhere (as these will be internet facing):

Once complete, we can then NAT through the required ports (TCP 443, and UDP 53) by creating an inbound NAT rule. Remember, you’ll need to do this twice on each Load Balancer – so both ports are forwarded to the NetScaler on the Load Balancer site.

Once these are all in place – we can test the NetScaler Gateway is up and accessible by visiting https:// and then the Load Balancer IP for each site. Before we configure GSLB on the NetScaler – we need to delegate the DNS Zone. This will vary depending on how your External DNS Servers are setup. Essentially – we need lookups for the Gateway URL to be handled by the NetScaler appliances. This means any lookups for the Gateway IP need to be delegated to the NetScaler appliances – so that they can provide the URL for the Active GSLB Site.

I’m using Azure DNS – so I have a zone setup. My URL is desktop.jwnetworks.co.uk – so I will be delegating control of this to the NetScalers. To start – create two A Records, one for each NetScaler, and these need to be pointed at the NAT’d IP. These will be used for the DNS Lookups. We then need to create a new NS Record for our Gateway Domain Name, with a 5 Second TTL, and pointing it at the A Records for the NetScalers we configured above:

Now that this is in place – it’s time to configure our GSLB Configuration! This can be done from one of the NetScalers and then propagated to the other via Configuration Sync. I’ll therefore carry this out on the EUS NetScaler – as you can see below, we have only the sites setup for GSLB (as we did this earlier):

We start by clicking the “Configure GSLB” button, and then run through the Wizard – I am going to run with an Active/Passive site:

We then click OK, and we are presented with the GSLB Sites pane – but we have already configured this:

We can click continue, and we then need to setup the GSLB Services. So these are the two NetScaler Gateways we are using to provide this service. The key here is that we need to make sure that the Public IP addresses are listed – EUS is shown below:

We then click Create, and then repeat this process for the SCUS Site – making sure that the Public IP address is entered again. Once this is done, we will have a Local and Remote Site Configured:

Next you will be prompted to create a GSLB Backup vServer – but I’m not going to create that as part of this Proof of Concept.

Next – we create the GSLB vServer. This is just the entry point for traffic being Load Balanced by GSLB. Note – ensure that you pick an appropriate load balancing method, and then click continue after filling out your details:

Next – click on Save, and the configuration is done! We can now sync the config across to the other NetScaler. This is done by clicking on “Auto Synchronisation GSLB” from the GSLB Dashboard:

Once this completes successfully – we can test our configuration! To start – we can do an nslookup, and set type=ns. This will tell us that the NameServers are correctly configured:

As you can see – the nslookup is returning all the expected information. Because we configured Active/Passive – the A record returned for a normal (A record) nslookup is that of the EUS NetScaler. Next – we can test that the Gateway Page is working and accessible:

Bingo – all good so far! Next – let’s try shutting down the EUS NetScaler and see if things are still working as expected. At this point the IP address returned should change from that of the EUS Load Balancer IP, to the SCUS Load Balancer IP:

Before EUS NetScaler Shutdown:

After EUS NetScaler Shutdown:

As you can see this works as expected, and after a page refresh, the NetScaler Gateway page is shown again:

This means that during a failure condition affecting the EUS NetScaler, requests for the Gateway URL will be directed (via DNS) to the SCUS Site. This provides Data Centre level failover for Gateway Services, making use of native Azure Load Balancers, and a single NetScaler on each site. This solution is suitable for pretty much any service accessed via a Web Browser – GSLB can be used in this way to fail NetScaler Gateway services over between Azure Sites or to distribute other traffic types as required.

 

 

 

 

Quick Post! – Using NetScaler responder policies/actions and backup vServers to notify users of Service Downtime

NetScaler, as I am sure you are aware, is a superbly powerful Application Delivery Controller. One of the most useful features is the use of Responder Policies/Actions, and Backup vServers to indicate that a service is down or to provide access to an alternative service for end users.

Let’s say – you have a load balancer configured that balances two Web Servers, which you then present for users to access. This load balancer could be configured with monitoring to check the health of the Web Servers, and also to ensure that the servers are loaded evenly with requests.

But – what happens when both servers are down? Maybe you have scheduled patching, or there is a fault. Leaving users with a blank page or one that times out is never ideal, and letting them know there is a fault is always best. This is especially relevant if the users are accessing pages of a commercial nature – for example, “The Online shop will be back open at 2pm” is a lot better than a blank page.

Both of the options below take less than 10 minutes configuration on a NetScaler – so if you need to get a page up quickly… these are very useful!

To do this – we can go down one of the two routes below:

Option 1

Use the Redirect URL option on the vServer – this redirects client requests to the custom URL when the service is down. So for example this could be a custom page on elsewhere, which explains that there is an issue.

This could be used for Scheduled patching for example – for times when we know the site will be down, we just add the URL to the vServer and any users who visit during the maintenance window will see the redirected page.

Option 2

Another option, is to use the Backup vServer feature within NetScaler – this is great because it can be used to direct traffic to a backup Data Center if our primary is unavailable. But also – we can use this feature to put up a NetScaler generated page informing the user that there is a problem with the backend service.

To do this – we need to create a new vServer to use for Maintenance. Create this with the following basic settings:

Then create a service for this – any local service bound to 127.0.0.1 will do. Assign a basic monitor so that the service shows as up.

Next – go to AppExpert, and then Responder, and create a Responder Action. Fill out the details as below – note: you will need to create a new HTML Page, this is the 2nd and 3rd screenshots below:

HTML Page:

Click on Done, and then Click Create for the Responder Action.

We can then go back to the vServer and assign this under the Policies option. Select the options as per the screenshot below and press Continue:

Next we are presented with the below screen, click on the Plus sign next to “Select Policy”:

Fill out the details as per the below:

Click on “Create” and then click on “Bind”. Finally, click “Done”. We can now apply this vServer as a backup vServer to others – so that if those vServers are down, this page will be shown to users.

This is configured on a per vServer basis as per the below:

Now when accessing the page, and with the services down – the following page is shown:

Obviously, you can customise the page a little more than I have – but hopefully this will help. It’s a quick step to setup and gives some extra information to users when there is scheduled work or an unexpected fault.

Upcoming CUGC User Share Webinar!

Improving the Resiliency of Your XenDesktop Environment – StoreFront Multi Site and NetScaler GSLB for StoreFront

Thursday, February 22, 2018 – 1:00 PM – 2:00 PM EST (6:00 PM to 7:00 PM GMT)

This Webinar focuses on improving XenDesktop environment resiliency using StoreFront Multi Site and NetScaler GSLB. This will cover the setup of these methods, the benefits and pitfalls, and show some practical demonstrations. There will be a Q and A afterwards. Dave Brett (CTP) will be joining as moderator for this webinar.

You can register for the Webinar here.

Testing out Project Honolulu

Recently I have been testing out something new – Project Honolulu from Microsoft. I first heard about this on Twitter (thanks to Eric @XenAppBlog), and was interested in what it could offer straight away. Project Honolulu is a new way to manage Windows Server – using a web based method, that does not rely on the traditional Server Manager GUI. Functionality is similar, and offers the usual range of configuration options, as well the ability to manage roles and features as you would normally expect.

You can download Project Honolulu here. Windows Server 2016 is supported natively, but for 2012R2 Support, you will need to install WMF5.0 (KB3134758).

Project Honolulu has a number of ways to deploy – but I went with a simple install on a single server within my lab. Once this is completed (it’s an easy next next next done install), you are presented with the following screen, which opens up in your default browser:

From here – we can add server connections. Note: Standalone, Failover Cluster, and Hyper-Converged systems are supported:

After I’d added a few servers from my lab, the main screen appeared as below. You can also import servers from a text file – so an export from AD is possible too:

From here, we can see the status of the servers I have added and then drill down further into the options by clicking on a server name. The overview screen gives the usual range of information we’d expect to see:

Particularly nice – is the metric display, which gives an overview of CPU, Memory, Ethernet, and Disk Activity. This is realtime data – but useful for monitoring key servers/clusters, perhaps on an Ops display board or large screen etc.:

As well as a range of metrics available, we have a range of management tools we can take advantage of. Particularly interesting is the ability to manage elements like Network Adaptors, Services, and Roles/Features, as well as to view Event Log entries and the Registry:

Management of Services is also a very useful feature – allowing services to be stopped and started (I wish it had a restart button though!) from the Web Console. This is particularly useful for Managed Service Providers – when the 2am call comes in that a failure has occurred, instead of a VPN into an RDP Session into another RDP Session, you can fire up a Web Interface and restart the service from there (NAT rules and an SSL cert required of course…) :

You’ll notice here that I’ve highlighted a couple of Citrix Services too – Project Honolulu allows you to manage all services running on a supported machine. So this is great for managing 3rd Party applications and services too. The lightweight nature of the system also means that this can be added to existing systems with ease (a single installer and a list of servers).

I’m really interested to see where this Project will go – in particular, it makes the use of Server Core much more accessible, because a familiar and common interface can be used for management of multiple servers. It also allows simple management of basic server configurations, as well as Service management for Microsoft and Third Party applications. Any environment could probably benefit from a single interface that allows basic configuration and Service restarts… the key questions is… where will this Project go next?

I’d really like to see support for more configuration changes, for example, customisable PowerShell options (e.g. this button in the interface runs this remote command) or support for a PowerShell session via the Web Interface. Also it would be great to see support for Third Party software – for example, additional modules that could be included to provide web based management of other software items on the server.

 

 

 

 

XenDesktop Site Failover – asking the community…

Recently I’ve been doing a lot of work on large deployments that require active/active or active/passive setups, whereby options to fail over to a DR site are either required as part of the design, or presented as future enhancement to the customer. Most of these have been fairly open questions – “How can we achieve this?” for example. It’s a question that is almost completely subjective; it depends entirely on business needs, and what the available budget is.

Subjective elements aside, it is a much debated technical area, so I opened up a question on the MyCUGC forums to ask the community how they were going about this. I also tweeted the question out @jakewalsh90:

I based my question around the concept that is most common (certainly to me at least) – an active/active or active/passive design, with a primary site and a secondary (DR/Backup) site. This is without a doubt the most common environment type that I encounter, predominantly in small and medium enterprises up to around 5000 users.

The main purpose of this post is to summarize the elements (both technical and strategic) that could be considered, and the different options we can lean on to help achieve the desired results. And also, to highlight just how good the response from the Citrix Community was on this question!

Key Considerations

By far the most common point that came out of the discussion around this was – “it depends”. There are a great number of factors to consider for any solution like this, including:

  • Budget – what is affordable and achievable with our budget?
  • Connectivity – are we limited by latency/bandwidth/other traffic etc? Are we using Dark Fiber, MPLS, VPN etc?
  • DC Locations – if we are planning for a Secondary/DR site, is it likely this would ever be affected by an issue that took down our primary site? (Hurricanes, Floods, Earthquakes etc.)
  • Capacity – is this a full DR/Secondary solution or just a subset of applications and users?
  • Hardware – do we have the hardware to achieve this? Is it within our budget?
  • Software – can we do this within our current licensing or do we need an uplift?
  • Applications – are we replicating everything or just key applications? How will these applications perform in another DC? (Applications may have web/database dependencies based only in a single site).
  • User Data – are we replicating user data too? How are profiles going to be handled?
  • Failover method – are we utilizing a Citrix solution for this, or perhaps a product like VMware Site Recovery Manager? How is failover undertaken – automatic? manual?

Citrix Considerations

Aside from the many other factors affecting a question like this, our discussion focused on the Citrix technical elements aimed at DR/Failover options available. I’ve highlighted the key points we discussed, and gathered a number of resources that I think are helpful in discovering these further:

 

GSLB via NetScaler for StoreFront (Access Layer) – this was a common theme throughout the discussions, and there seems to be a general consensus that utilising GSLB on NetScaler is a logical way forward. Creating an access layer that utilizes NetScaler GSLB and StoreFront, whilst spanning the DC’s, will give a solution that is resilient and reliable, and won’t require complex replication/management. Dave Brett has written an excellent article on setting this up.

 

 XenDesktop Site with ZonesZones in XenDesktop are an awesome way to split geographically (or logically) separate resources, whilst maintaining the ease of management and reduced overhead of only having a single farm. Utilizing Zoning to form an active/active or active/passive solution is simple in configuration terms too. With Zones users can be automatically redirected to a secondary zone VDA during the failure of a their primary zone VDA.

 

Local Host Cache – as I am sure you are aware, Local Host Cache is now back in XenDesktop, and provides additional tolerance for database outages. LHC allows connection brokering operations to take place when the following issues occur:

The connection between a Delivery Controller and the Site database fails in an on-premises Citrix environment.

The WAN link between the Site and the Citrix control plane fails in a Citrix Cloud environment.

See https://docs.citrix.com/en-us/xenapp-and-xendesktop/7-12/manage-deployment/local-host-cache.html for further details on LHC.

You can check to see if LHC is on by running the following PowerShell: Get-BrokerSite. I’m running 7.15 in my lab so it is enabled by default:

 

SQL Options – SQL is a key component of the FMA architecture – so any solution (with or without DR/Failover) needs a reliable solution for hosting Site Databases. Usually my go to solution is to mirror any databases using SQL Mirroring. AlwaysON Failover Clustering, and Always On AvailabilityGroups are both possible solutions – particularly given that Database Mirroring is being deprecated.

When DR is considered this opens up additional hardware and software requirements to provide suitable hardware and SQL Server licensing.

See page 101-102 of the Updated Citrix VDI Handbook for further information on SQL redundancy and replication options: http://docs.citrix.com/content/dam/docs/en-us/xenapp-xendesktop/7-15-ltsr/downloads/Citrix%20VDI%20Handbook%207.15%20LTSR.pdf

 

Using StoreFront to handle the Failover (Site/Delivery Controller Level) – From StoreFront 3.6 it has been possible to Load Balance Resources across controllers, allowing StoreFront to effectively handle failover between XenDesktop Farms. (See https://www.citrix.com/blogs/2016/09/07/storefront-multi-site-settings-part-2/ for more details on this)

This method allows us to have two XenDesktop Farms – and to publish identical resources which are then load balanced by the StoreFront server. Failover would only occur in the event that a Delivery Controller was unavailable in the primary site. This solution would still allow for a GSLB approach with StoreFront and NetScaler too.

The main disadvantage of this approach is the increased management overhead of the additional XenDesktop Farm, but this can be managed by having good practices in place.

This is configured in the Delivery Controller section of a StoreFront site – and requires both farms to publish the resources required for failover. See below – two Farms configured in the Delivery Controller section within a StoreFront site:

We also need to configure the “User Mapping and Multi-Site Aggregation Configuration”. Note that below I have configured all Delivery Controllers to for “Everyone” – but this may need to be adjusted in a production environment:

You will also need to configure resource aggregation as below. For failover, do not tick “Load Balance resources across controllers”. However, “Controllers publish identical resources” will need to be ticked so that identically named published applications or desktops are de-duplicated:

With this set, any resources published in both farms will be launched from the Secondary Site in the event that the Delivery Controllers in the first site fail to respond.

 

Application Level Failover using Application Group Priorities – it is also possible to use application groups with priorities to control the failover of applications. When you configure an application group in XenDesktop 7.9+ you are able to configure this:

Gareth Carson has a great blog post on this which explains the functionality in more detail.

In Conclusion…

Hopefully this post has been helpful in highlighting some of the considerations for a DR/Second Site scenario. And also, has helped to highlight some of the Citrix technologies and great community resources out there to help make the process a little easier. It’s been useful for me to ask the question and compile a post like this because I’ve had to look into the various technologies and find out more about them in my own lab before writing this… until next time, cheers!

 

 

 

Citrix Workspace Environment Management – IO Management

I’ve been blogging a lot this year on the merits of Citrix Workspace Environment Management (WEM) and the various features it provides. Another feature is I/O Priority – which enables us to manage the priority of I/O operations for a specified process:

To demonstrate this, I am going to run IOMeter (a storage testing tool – that consumes, but also measures CPU utilisation during testing), and SuperPi (a tool that calculates Pi to a specified number of digits – and consumes large amounts of CPU during calculation).

Before making any WEM configuration changes, on my virtual desktop the results are as follows:

IOMeter (Using the Atlantis Template – available here) –  shows 6.56% CPU Utilisation, and 3581 I/Os per second:

SuperPI calculation to 512K – 7.1 seconds:

Next I added the IOMeter and SuperPi executables into WEM, and set the priority to very low:

As a result of doing this the IOMeter results are significantly reduced, and the calculation time for SuperPi has increased significantly:

IOMeter Result – around 60% reduction in I/O per second, and 2% CPU usage reduction:

SuperPI – time to calculate has increased by nearly 200%:

From this test – it is clear to see that I/O Management within Workspace Environment Management is an effective way to control the I/O operations of specified processes. Whilst you might think slowing down the performance of an application is unlikely to be a major requirement for many of us – the ability to control particularly resource intensive applications is a definite win for complex environments. If a particular application is causing performance problems (for example degrading the performance for others) then this provides a suitable solution to manage that process.

Citrix Workspace Environment Management – Process Management

After testing out the excellent CPU and Memory management features in Citrix Workspace Environment Management (WEM), I wanted to blog about how processes can be controlled using the software.

Prior to starting this test, I have a basic Citrix XenDesktop environment configured, a WEM environment configured, and the relevant group policies in place to support this.

To prevent processes from running, we browse to System Optimization, and then Process Management:

From here we can enable process management:

Next – we have two options, we can whitelist, or blacklist. If we whitelist – only those executables listed will be allowed to run, whereas a blacklist will block only those listed.

I’m going to test out a blacklist:

We can exclude local administrators, and also choose to exclude specified groups – for example perhaps a trusted subset of users or specific groups of users who need to run some of the applications we wish to block.

For this test I am going to add notepad.exe to the list:

Next I saved the WEM configuration, refreshed the cache, and then logged into a Desktop Session to test the blacklist. Upon firing up notepad I am greeted with the message:

Bingo – a simple and effective way to block processes from running. This would be very effective when combined with a list of known malicious executables for example, or known problematic software items.

In a future release I’d love to see more granularity in this feature – for example blacklists, with the ability to whitelist processes for certain groups, rather than as a whole. This would enable control of applications on a much more granular level – for example, blocking “process.exe” for Domain Users, but allowing it for a trusted group of users.