Azure VM Scale Sets and Remote Desktop Services?

When using any environment that provides virtual desktops at scale, it makes sense to have only the required number of resources running at the right time – rather than all of the resources all of the time. The usual approach to this is to use power management – so unused virtual machines are shut down when not in use.

With Azure we have another potential option designed for large workloads – to use Virtual Machine Scale Sets. This allows us to automatically scale up and down the number of Virtual Machines based on various factors and choices. This effectively allows us to ensure the most economical use of resources – as we never pay for more than we need to use, because the machines are de-allocated when not required. Scale Sets also provide a number of features around image management and VM sizing that could be useful for VDI environments.

In this post I am going to explore the validity and feasibility of VM Scale Sets for a Remote Desktop Services Environment. To start this post – I have the following environment configured, minus the scale set:

Note: if you need an RDS environment – this Azure template is awesome: https://azure.microsoft.com/en-gb/resources/templates/rds-deployment/ – I would advise using multiple infrastructure VMs for each role if this is a production service though.

Next – I configured a single server with the RDS Session Host role and all of the applications I require, as this will become our VM image. I then ran sysprep /generalize as per the Microsoft instructions for Image Capture in Azure. (See here). Once this is done we need to stop and de-allocate the VM, and then we need to turn this into an image we can use with a scale set:

$vmName = "rdsimage01"
$rgName = "eus-rg01"
$location = "EastUS"
$imageName = "rdsworker"
Stop-AzureRmVM -ResourceGroupName $rgName -Name $vmName -Force
Set-AzureRmVm -ResourceGroupName $rgName -Name $vmName -Generalized
$vm = Get-AzureRmVM -Name $vmName -ResourceGroupName $rgName
$image = New-AzureRmImageConfig -Location $location -SourceVirtualMachineId $vm.ID
New-AzureRmImage -Image $image -ImageName $imageName -ResourceGroupName $rgName

Once this is done – we have a VM image saved:

So once we have an image – we can create Virtual Machines from this image, and create a Scale Set that will function as the means to scale up and down the environment. However – we need to do some more work first, as if we just scale up and down with a sysprepped VM, we end up with a VM off domain that won’t be of any use to us…. !

Usually – I just spin up Lab VMs using a JSON Template that creates the VM and joins it to an existing lab domain, using the JoinDomain extension. This saves me lots of time and gives me VMs deployed with minimal input (just a VM name is all I have to enter):

    {
      "apiVersion": "2015-06-15",
      "type": "Microsoft.Compute/virtualMachines/extensions",
      "name": "[concat(parameters('dnsLabelPrefix'),'/joindomain')]",
      "location": "[resourceGroup().location]",
      "dependsOn": [
        "[concat('Microsoft.Compute/virtualMachines/', parameters('dnsLabelPrefix'))]"
      ],
      "properties": {
        "publisher": "Microsoft.Compute",
        "type": "JsonADDomainExtension",
        "typeHandlerVersion": "1.3",
        "autoUpgradeMinorVersion": true,
        "settings": {
          "Name": "[parameters('domainToJoin')]",
          "OUPath": "[parameters('ouPath')]",
          "User": "[concat(parameters('domainToJoin'), '\\', parameters('domainUsername'))]",
          "Restart": "true",
          "Options": "[parameters('domainJoinOptions')]"
        },
        "protectedSettings": {
          "Password": "[parameters('domainPassword')]"
        }

See https://github.com/Azure/azure-quickstart-templates/tree/master/201-vm-domain-join for more details and to use this template.

Now that we have a template – we are ready to go. I’m using Visual Studio to create the JSON for my deployment – and fortunately there is a built in scale set template we can use and modify for this purpose:

With the template up and running, we just need to add some parameters – and we can run a basic test deployment to confirm everything is working. My parameters for the basic template are shown below:

A quick test deployment confirms we are up and running:

However, there are a few issues with the template we need to correct – namely:

  • The machines are not joined to the Domain – and we need to place them into the correct OU for GPO settings too
  • A new VNET is created – we need to either use peering (prior to creation – or domain join operations will fail), or better an existing VNET already setup
  • The load balancer created is not required – we’ll be using the RDS Broker anyway

For this test – all I am concerned about is the domain join and VNET. The load balancer won’t be used so I can just discard this – however, the VNET and Domain Join issues will need to be resolved!

Issue 1 – using an existing VNET

To fix this, I am not going to reinvent the wheel – we just need some minor adjustment to the JSON file, based on this Azure docs article – https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-mvss-existing-vnet. In short, this will achieve the following:

  1. Add a subnet ID parameter, and include this in the variables section as well as the parameters.json
  2. Remove the Virtual Network resource (because our existing VNET is already in place)
  3. Remove the dependsOn from the Scale Set (because the VNET is already created)
  4. Change the Network Interfaces of the VMs in the scale set to use the defined subnet in the existing VNET

Issue 2 – joining the Scale Set VMs to an AD Domain

To get the VMs in the scale set joined to an AD Domain we need to make use of JsonADDomainExtension.

"extensionProfile": {
    "extensions": [
        {
            "name": "joindomain",
            "properties": {
                "publisher": "Microsoft.Compute",
                "type": "JsonADDomainExtension",
                "typeHandlerVersion": "1.3",
                "settings": {
                    "Name": "[parameters('domainName')]",
                    "OUPath": "[variables('ouPath')]",
                    "User": "[variables('domainAndUsername')]",
                    "Restart": "true",
                    "Options": "[variables('domainJoinOptions')]"
                },
                "protectedsettings": {
                    "Password": "[parameters('domainJoinPassword')]"
                }
            }
        }
    ]
}

With this added to the JSON template for our deployment, we just need to add the variables and parameters (shown below) and then we are good to go:

Note: the first time I used this I had an issue with the Domain Join – it was caused by specifying only the domain admin username. When specified in the form above (domain\\adminusername) it then worked fine.

Now when we run the template, we get the usual Visual Studio output confirming success – but also a scale set, and, machines joined to the domain:

Because I have previously configured the image used in the Scale Set with the RDS Role, and the Software required – we just need the servers to use an RDS Broker that will manage inbound connections into the RDS Farm. This is where I encounter the first sticking point – these need to be added manually when the Session Collection is created 🙁

This wasn’t a massive issue for this test – so I went ahead and created a Session Collection and added in my VMs:

Next I tested the solution by launching a Desktop via Remote Desktop Web Access:

Bingo – I was then logged into an RDS Session. Note the RDS Connection Name (showing the Broker) and the Computer Name (showing the Session host). This confirms we are running as expected:

I’ve now demonstrated the RDS Farm up and running, utilizing machines created by a Scale Set, and also accessed via a connection broker. But – we aren’t quite done yet, as we have not looked how a scale set could enhance this solution. Below are a few ways we can improve the environment using Scale Sets, and a few limitations when used with RDS:

  • We have the option to Manually increase VM instances if we need more Session Hosts:

Note: this will require adding to the RDS Session collection manually (or via PowerShell)

  • We can scale the environment automatically using Auto Scale:

Below you can see a default scale rule (5 VMs in the Scale set) and then a rule that runs between 0600 and 1800 daily, and increases the VM Count up to 10 VMs if average CPU usage goes above 80%.

The rule for this Scale operation is shown below:

Note: this will still require machines adding to the Session Collection manually.

  • We can increase the size of the VMs

Once a new size has been selected – the existing VMs show as not up to date:

We would then need to upgrade the VMs in the scale set (requiring a reboot), but, does not require the VMs to be re-added to the Session Collection. With this option a drain, upgrade, drain, upgrade option would be available. This allows for a sizing upscale – without lots of reconfiguration or management required.

Overall, it would seem that although scale sets aren’t able to fully integrate with Remote Desktop Services collections, they are still very capable and powerful when it comes to managing RDS Workloads. Scale Sets can be used to size and provision machines, as well as to provide simple options to increase environment capacity and power. Purely using a scale set for the ability to spin up new VMs, or to manage sizing across multiple VMs is a logical step. We also have the option to reimage a VM – taking it back to a clean configuration.

Key Observations from my investigation:

  • We can scale an RDS environment very quickly, but RDS Servers can’t be automatically added to a session collection – the GPO settings for this don’t appear to support RDS post 2008R2 (whereby Session Collections and the new configuration method was introduced). This means servers have to be manually added when the Scale Set is scaled up
  • Scale sets can be used to increase VM size quickly – without reimaging servers (a reboot is all that is required)
  • Scaling can only look at performance metrics – we can’t scale on user count for example
  • Reimaging means we can take servers back to a clean build quickly – if a server has an issue we would just prevent logons and then reimage.
  • Scaling down can’t take logged on users into consideration – so we’d need a way of draining servers down first
  • Scale Sets will also allow us to scale up to very large environments with minimal effort – just increase VM count or size, and add the servers into the RDS Collection. A growing business for example – or one that provides a hosted desktop could scale from 10 servers to a few hundred with minimal effort.

Hope this helps, and congratulations if you have made it to the end of this article! Until next time!

Resources:

Azure Traffic Manager for NetScaler Gateway Failover

Azure Traffic Manager is designed to provide traffic routing to various locations based on a ruleset that you specify. It can be used for priority (failover), weighted distribution, performance, and geographic traffic distribution.

The failover option is similar to GSLB – and works in a similar way, so I am going to demonstrate that in this post. I’ve started with the following environment already configured:

  • Two Azure Locations (East US and South Central US), with a VPN between the sites to join the VNETs
  • 1 Domain Controller in each location
  • 1 NetScaler (standalone) in each location
  • 1 Citrix Environment spread across the two locations
  • NetScaler Gateway’s setup in both sites, and NAT’d out using Azure Load Balancer. (So that we have a public IP offering NetScaler Gateway services in both Azure Locations. Have a look at this Blog post if you require guidance on setting this up.)

Azure Lab Diagram

Before we setup the Azure Traffic Manager profile, we need to give our Public IP Addresses a DNS name label. To do this, browse to the Public IPs for your Load Balancers, and then click on “Configuration”. We need to give our Public IP addresses a DNS name label, as this is what Traffic Manager will be using to balance the endpoints.

Azure Public IP configuration

I have two public IPs so I have created two DNS Name Labels and given them appropriate names:

  • desktop-eus-jwnetworks = East US NetScaler Public IP
  • desktop-scus-jwnetworks = South Central US NetScaler Public IP

Next – it’s time to create the Azure Traffic Manager profile!

Traffic Manager profile creation

After we click create, we just need to populate a few basic details:

Traffic Manager profile creation

As you can see – I have given my Traffic Manager a name, selected Priority as the routing method (this gives us the failover in a similar manner to Active/Passive GSLB). Note: there are other options available:

Traffic Manager routing options

See here for an overview of the Traffic Routing Methods. Next – we need to configure some more settings on our Traffic Manager, to ensure that the Monitoring and Traffic Routing are going to work correctly. In the screenshot below I have adjusted the following:

  • DNS TTL – I’ve adjusted this to 60 seconds, this defaults to 300 seconds (5 minutes)
  • Protocol – HTTPS, this is because we are Monitoring the HTTPS NetScaler Gateway
  • Port – 443 as we are using this port for the NetScaler Gateway
  • Path – this is the path to the files that the monitor will be checking for, so in the case of NetScaler Gateway this is /vpn/index.html – if this page is not available then the service will be marked as unavailable.
  • Probing Interval – this is how often the endpoint health is checked. Values are either every 10 seconds or every 30 seconds
  • Tolerated number of failures – this is how many health check failures are tolerated before the endpoint is marked as unhealthy
  • Monitoring timeout – this is the time the monitor will wait before considering the endpoint as unavailable.

For more information on these configuration options – click here.

Traffic Manager configuration

Next – it is time to add our endpoints! To do this, click on Endpoints and then on Add:

Traffic Manager endpoints

We then need to add our Public IP addresses assigned to the Azure Load Balancers (where the NAT rules were created). Note – you will need to do this for BOTH endpoints:

Adding Endpoints to Traffic Manager

Once both are added, you will see the below in the Endpoints screen. Note that both Endpoints are shown as “Online” – this confirms our monitor is detecting the Endpoints as up. Also note that the Endpoints have priority – this means that under normal operation, all traffic will be sent to the “eus-desktop” endpoint (Priority 1), and in the event of a failure of the “eus-desktop” endpoint, all traffic will be directed to the “scus-desktop” Endpoint.

Endpoints

All that is left to do is test – however, first let’s make things neat for our users with a CNAME DNS Record. We are effectively going to CNAME our jwdesktop.trafficmanager.net record to something that users would be able to remember. You can find your record from the overview screen:

Traffic Manager overview

Next up I added a CNAME record in my Azure DNS Zone:

Add DNS Record

Add CNAME Record

Once this is created – we can start testing! But first, a diagram! Below is shown what we now have setup and working:

Solution Diagram

Note: in order to easily distinguish between my two Gateways, I set the EUS Gateway to the X1 theme, and the SCUS Gateway was left on the default NetScaler theme. When accessing https://desktop.jwnetworks.co.uk I am correctly shown the EUS Gateway:

EUS Gateway Test

Bingo – this all looks good to me! Next up, I disabled the Virtual Server for the EUS NetScaler:

Virtual Server Disabled

After around 30 seconds… the Monitor Status shows as Degraded:

[ for those interested in the maths (10 Second Probe interval + 5 second timeout)x1 tolerated failure (so effectively 15×2 attempts at connecting) ]

Endpoint Health

Next I refreshed the Page and we are presented with the SCUS Gateway page:

SCUS Gateway Page

As you can see, during a failure condition (the EUS Gateway vServer being taken down) the Traffic Manager directs traffic to our Priority 2 site, without any intervention from us. Any users would be able to refresh the page and then log back in. This can be used not only for NetScaler Gateway but for many internet facing services – for example OWA, SharePoint etc. There’s a great many services that can benefit from this type of failover and the resiliency that it offers.

Load Balancing Citrix StoreFront with Azure Load Balancer

Sometimes there is a requirement to Load Balance StoreFront using a method other than NetScaler. Although rare (in my experience!) this does occasionally happen when NetScaler is perhaps not being used for Remote Access –  in an internal only environment for example.

In this post I will explain how to Load Balance StoreFront using the native Azure Load Balancers. We start with a simple setup:

  • 1x Domain Controller
  • 2x Citrix StoreFront Servers – in an availability set called “EUS-StoreFront”
  • 1x Virtual Network (VNET)

All of the above is in the East US Azure Location.

We start by creating a new Azure Load Balancer. Note a few key settings here:

  • Type: Internal – this is because we are balancing traffic within our VNET (Internal Network only)
  • IP address – static… we don’t want the LB IP to change!

Once this is done – we can add the backend servers. We do this by targeting the Availability Set that the StoreFront Servers are in. For those familiar with NetScaler, this is similar to a Service Group:

Next – we need to configure some Health Probes. This allows us to determine the state of the StoreFront server and to confirm that the services we are load balancing are healthy and available. Note: at the current time Azure Load Balancer HTTP checks support relative paths only, so I have used /Citrix/CitrixWeb/monitor.txt – a simple text file (Static Content) I created to check that the Web Server is serving out content and thus working correctly. (https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/load-balancer/load-balancer-custom-probe-overview.md) I have configured by Health Probe as below:

Next – it’s time to create the Load Balancing Rule that will form the entry point for Load Balanced traffic. Note the Protocol (TCP), Ports (80 Frontend, and 80 Backend), Backend Pool (StoreFront Availability Set), Health Probe (our HTTP 80 monitor.txt check), Session Persistence (Client IP), and Idle Timeout (30 minutes is currently the maximum value):

We can then click OK and our Load Balancing Rule is created! Next I created a DNS A Record for StoreFront and pointed it at the Load Balancer IP. After this, I opened up a browser and typed in my newly created StoreFront DNS record. Bingo – we have a page!

To test that the Load Balancing was working. I shut down IIS on each server in turn, and then tested. Sure enough – even when only 1 out of 2 servers was running, the page stayed up and StoreFront was accessible.

This Load Balancer can be used for a variety of Web Applications, and is a simple way to Load Balance Azure based services as you require. Until next time… cheers!