Citrix Workspace Environment Management – IO Management

I’ve been blogging a lot this year on the merits of Citrix Workspace Environment Management (WEM) and the various features it provides. Another feature is I/O Priority – which enables us to manage the priority of I/O operations for a specified process:

To demonstrate this, I am going to run IOMeter (a storage testing tool – that consumes, but also measures CPU utilisation during testing), and SuperPi (a tool that calculates Pi to a specified number of digits – and consumes large amounts of CPU during calculation).

Before making any WEM configuration changes, on my virtual desktop the results are as follows:

IOMeter (Using the Atlantis Template – available here) –  shows 6.56% CPU Utilisation, and 3581 I/Os per second:

SuperPI calculation to 512K – 7.1 seconds:

Next I added the IOMeter and SuperPi executables into WEM, and set the priority to very low:

As a result of doing this the IOMeter results are significantly reduced, and the calculation time for SuperPi has increased significantly:

IOMeter Result – around 60% reduction in I/O per second, and 2% CPU usage reduction:

SuperPI – time to calculate has increased by nearly 200%:

From this test – it is clear to see that I/O Management within Workspace Environment Management is an effective way to control the I/O operations of specified processes. Whilst you might think slowing down the performance of an application is unlikely to be a major requirement for many of us – the ability to control particularly resource intensive applications is a definite win for complex environments. If a particular application is causing performance problems (for example degrading the performance for others) then this provides a suitable solution to manage that process.

Citrix Workspace Environment Management – Process Management

After testing out the excellent CPU and Memory management features in Citrix Workspace Environment Management (WEM), I wanted to blog about how processes can be controlled using the software.

Prior to starting this test, I have a basic Citrix XenDesktop environment configured, a WEM environment configured, and the relevant group policies in place to support this.

To prevent processes from running, we browse to System Optimization, and then Process Management:

From here we can enable process management:

Next – we have two options, we can whitelist, or blacklist. If we whitelist – only those executables listed will be allowed to run, whereas a blacklist will block only those listed.

I’m going to test out a blacklist:

We can exclude local administrators, and also choose to exclude specified groups – for example perhaps a trusted subset of users or specific groups of users who need to run some of the applications we wish to block.

For this test I am going to add notepad.exe to the list:

Next I saved the WEM configuration, refreshed the cache, and then logged into a Desktop Session to test the blacklist. Upon firing up notepad I am greeted with the message:

Bingo – a simple and effective way to block processes from running. This would be very effective when combined with a list of known malicious executables for example, or known problematic software items.

In a future release I’d love to see more granularity in this feature – for example blacklists, with the ability to whitelist processes for certain groups, rather than as a whole. This would enable control of applications on a much more granular level – for example, blocking “process.exe” for Domain Users, but allowing it for a trusted group of users.

 

Citrix Connection Quality Indicator

Overview

Connection Quality Indicator is a new tool from Citrix designed to inform and alert the user to network conditions that may affect the quality of the Session they are using.  Information is provided to the end user via a notification window which can be controlled using Group Policy.

Installation is supported on the following platforms:

See https://support.citrix.com/article/CTX220774 for more details.

Test Environment

My environment consists of a basic Citrix XenDesktop 7.12 installation:

  • 1x Desktop Delivery Controller (Local Database)
  • 1x Citrix StoreFront
  • 2x XenDesktop Session Host (Static VMs)

All VMs are 1 vCPU, 4GB RAM, and Windows Server 2016.

Installation

Connection quality indicator needs to be installed on each Session Host, or to a master template – and follows a simple next next finish installation with no configuration during install:

Post installation we can see the program installed via Control Panel:

Group Policy Configuration

As outlined in CTX220774 there are also Group Policy Templates that can be used. I have opted to copy these to the Central Store within my Domain. The templates can be extracted from any machine with Connection Quality Indicator on:

En-US templates are within the configuration folder ready for use:

Note – once placed into the Central Store, Group Policy Administrative Templates will be available as below:

Within Citrix Components we now have access to the Policy Settings for Connection Quality Indicator:

We are then able to modify the following settings:

Enable CQI – this setting allows us to enable the Utility, and also configure the refresh rate for data collection counters:

Notification Display Settings – from this setting we can configure the initial delay before the tool alerts the user to the connection quality rating, and define a minimum interval between notifications:

Connection Threshold Settings – this setting is perhaps the most interesting, because it is here we can tailor the tool to any specific environmental requirements. From this setting, we can control the definitions of High and Low Latency (in milliseconds), High and Low ICA RTT (in milliseconds), and the High and Low bandwidth value (in Mbps):

For the purposes of this demonstration – I’ve used default settings all round.

After configuring the group policies – I logged into a Desktop Session with the tool installed. 60 seconds after login the Window appeared with the session quality result:

If the cog symbol is clicked the user has the option to modify the location of the display window, snooze the tool, and also to see the test results:

Unfortunately, I have no method in my lab for degrading network performance artificially, or increasing latency etc. – but to prove that the metrics were functional, I adjusted the Group Policy settings so that some fairly unobtainable figures were used for all settings – and thus the tool would grade the connection quality differently:

This highlights how the tool can be used to identify connection quality through tailoring the GPO for a specific environment. After changing these settings, rebooting the Session Host (the lazy way of updating Group Policy!), and logging back in, the tool reported the following:

This is a very useful option within the tool – as we can specifically modify the settings to suit a range of environments. In some environments having low bandwidth might not be an issue, but high latency might be for example.

Conclusion

Overall this tool is very useful for giving the end user an insight into the quality of the network environment, and provides real time feedback on this quality. This is great for keeping end users informed, and managing expectations of performance too. What I also like is that end users will be able to see differences based on where they work – for example, a user with a “Strong Connection” inside the office, but a “Weak Connection” over 3G or at home, would know what sort of experience to expect, and would have real time data to support any troubleshooting moving forward.

 

Testing out the Atlantis USX Community Edition

Recently I’ve been using the Atlantis USX Community Edition, which is a free edition of the Atlantis USX Software – specifically for the purposes of testing and learning how the USX can improve the performance of a virtual desktop. Atlantis provide a number of testing guides and videos on the USX Community Edition landing page  – and also a testing guide, which outlines how to benchmark the software.

For this post I wanted to demonstrate the results I’m getting in my lab – as they give an idea of the benefits a solution like this can bring. As part of this process I’ve been reading up on various testing methods and options, and I eventually settled on using the same configuration detailed in Jim Moyle‘s excellent article – available here. I should also note – the USX Community Edition also provides a pre-made IOMeter configuration file (in the Citrix Testing Guide), but I have opted to follow the baseline in Jim’s article.

My test configuration is as follows:

  • 1x HPE ML110 Gen9
  • 40GB RAM
  • 2x 700GB SSD in RAID 0
  • 1x 480GB SSD
  • 1x 1TB HDD

All storage is via an HPE Dynamic Smart Array B140i RAID Controller.

Base VM for testing using IOMeter is a Windows 2012R2 Standard VM with no tuning or modification applied:

My configuration for the USX Appliance is as follows:

Due to the RAM available on my host I went for the small appliance:

All other configuration was standard, and all infrastructure VMs were stored on storage not participating in this testing (so as not to affect the result). The USX CE also includes an excellent management interface, which allows you to monitor the health of the environment, and displays useful statistics:

After setting up the configuration, I decided to test 3 storage configurations, against the IOMeter baseline, and then post the results to give an idea of performance:

Test 1 – VM on 1x HPE 1TB HDD

Test2 – VM on 2x 700GB SSD RAID 0

Test 3 – VM on Atlantis USX CE Storage

As you can see, the USX CE wins in every storage metric displayed – there is no contest here:

  • In terms of storage throughput, the SSD array provides around 13x the speed of the HDD, but the USX provides around 60x the performance of the HDD, and 5x the performance of the SSDs in RAID 0.
  • The average read and write response times are also significantly different across the board – with the USX read being around 30x faster than the HDD, and the write being around 60x faster than the HDD. The USX also demonstrates performance around 4-5x faster than the SSDs in RAID 0 for average read and write response time.
  • Total IOPS is also a useful metric – again one that the USX appliance claims the prize for; IOPS are around 65x higher than the HDD, and around 5x higher than the SSDs in RAID 0.

Overall – the USX demonstrates around 60x the performance of the HDD, and around 5x the performance of the RAID 0 SSD array in my lab. If you haven’t already tried out the USX Community Edition I would definitely recommend it, not only as a demonstrator of how this technology can improve VDI (and other) workloads, it’s also great if (like me) your lab time is precious, and anything to speed up deployment and testing is a real bonus.

 

Citrix PVS – NTFS vs ReFS 2012R2 vs ReFS 2016

I’ve been doing some work recently around Citrix Provisioning Services, and this has prompted me to investigate what new features are available in version 7.11. One that stood out to me was the support for Microsoft’s Resilient File System (ReFS) on Windows Server 2016. This file system is interesting for those with virtualized environments due to the extra speed enhancements around the use of VHD and VHDX files.

What’s also interesting is that Citrix state the type of performance enhancement that can be expected when using this version:

See: https://docs.citrix.com/en-us/xenapp-and-xendesktop/7-11/whats-new.html

So… I decided to test this out!

I started with a basic PVS setup of 1 server and 1 client – but times three to give me three farms. The first farm being a 2012R2 Server with the PVS vDisk storage on NTFS, the second being 2012R2 using ReFS, and the third being 2016 using ReFS. All of the client machines captured for the session hosts were 2012R2. All base specifications were identical with 4GB RAM and 1vCPU.

Capture Times:

  • Capture time for NTFS on 2012R2: 8:44
  • Capture time for ReFS on 2012R2: 5:50
  • Capture time for ReFS on 2016: 5:15

Boot Times:

PVS Server Streaming Directory – NTFS on 2012R2:

PVS Server Streaming Directory – ReFS on 2012R2:

PVS Server Streaming Directory – ReFS on 2016:

Testing vDisk Merging:

To test out vDisk merging I created a new maintenance revision and booted the client machines from this:

I then installed a number of applications using Ninite:

Installing these and the associated changes created a file size of around 4.5Gb for the differencing disk.

I then merged the changes to create a new base:

  • NTFS on 2012R2: 12:44
  • ReFS on 2012R2: 4:04
  • ReFS on 2016: 0:14 (yes… less than 15 seconds!)

Conclusion

As you can see – there is a noticeable speed increase when using ReFS on 2016 – in all tests the performance was significantly faster. Capture was around 40% faster. Boot times within my lab environment were almost negligible they were that fast – but ReFS on 2016 had a 3 second lead over ReFS on 2012R2, and 4 seconds on NTFS on 2012R2. Perhaps the most impressive speed increase was the merge operation though – 12:30 faster on ReFS on 2016 than NTFS on 2012R2!

All in all it’s pretty clear what I will be using when implementing PVS from now on….

VMware OS Optimization Tool and Windows 10

Overview

I’m a big fan of the VMware OS Optimization Tool and its capabilities, not only does it help to optimize VDI Environments through a range of settings and templates, it also helps optimize settings that otherwise would be complicated to control without a scripted or policy based method. I must confess I have been using this tool for some time, but mostly without quantifying the effect (particularly in lab environments, where every spare bit of resource is cherished).

I wanted to give an idea in this blog post about the power of the tool, by doing some side by side comparison of Windows 10 Operating systems, against templates available in the tool.

I’ll aim to cover the following in an Optimized and Non-Optimized capacity:

  • Booting
  • Resource usage, idle 5 minutes after login
  • Roaming User profile size after first login
  • Logon time with a roaming user profile – first login, profile removed from the local machine, and then a second login

2 identical VMs were configured for this test, with the following specification:

vm1

Both VMs are identical and the resource limits fall well within the capacity of the host, so I can be sure of no bottlenecks etc.

On one of the VMs I ran the VMware OS Optimization Tool:

vm2

I used the LoginVSI Template for my Optimizations:

vm3

This template contains lots of areas and settings – created by the good folks over at LoginVSI:

vm9

Optimization is simple – just pick a template and click “Analyze” and then “Optimize”:

vm4

After this you are presented with a results Window:

vm5

Testing:

Test 1 – Boot time (time to logon screen):

Optimized Non-optimized
44 seconds 45 seconds

Little difference here – to be honest I wasn’t expecting a huge change. Both machines are on SSD storage, with two fast processor cores available, and plenty of RAM – so no real bottleneck.

Test 2 – Resource usage after 5 minutes idle (whilst logged in):

Optimized Non-optimized
 vm6  vm7

Again – not a huge difference here. But the RAM saving of 0.2GB is worth noting. Multiply 0.2GB up to factor in a 1000 Desktop deployment and that’s 200GB of additional RAM – and when each GB of RAM comes at a price, this is a worthwhile saving.

Test 3 – Roaming profile size after first login

Optimized Non-optimized
Local: 110MB Local: 124MB
Roaming: 692KB Roaming: 984KB

Not really any huge difference here either – the smaller profile size is likely due to some of the features that have been disabled by the Optimization tool that would normally write data back into a profile during first login etc. I’d usually recommend avoiding Windows profiles with any VDI Solution anyway – and look to a solution like Citrix Profile Management or AppSense Personalisation Manager etc.

Test 4 – Login with a roaming profile, profile clear out, and subsequent login time (time to start screen):

Optimized Non-optimized
First Login (Profile Creation): 23 seconds First Login (Profile Creation): 43 seconds
Second Login (Loading Roaming Profile): 9 seconds Second Login (Loading Roaming Profile): 17 seconds

Quite a noticeable difference here. Initial logon time was 20 seconds less (47% faster). Subsequent logins were also noticeably faster – 9 seconds against 17 seconds (also 47% faster). Most of this appears to be due to tasks running during logon. For the initial profile creation this was things like default applications and Windows Store Apps etc – if we look further into the LoginVSI template for the tool, we can see a specific section just for login time reduction:

vm8

Overall, this tool has a clear impact on Windows 10 (and other operating systems) for VDI use. Not only does it lead to a reduction in Login Times, we also see a reduction in RAM usage from the VMs too. I’d recommend anyone currently running non-optimized environments to give this tool a go on a test machine and do some comparison themselves. Many of the features within a Desktop OS are unnecessary for VDI machines and a tool that provides a baseline like this is a great starting point.

 

 

Citrix Workspace Environment Management – Memory Management

After testing out the excellent CPU management features in Citrix Workspace Environment Management (WEM), I wanted to test out how well it handled applications that were particularly greedy with RAM consumption.

To start – I have a single Windows Server 2012R2 Session Host, with 4GB of RAM, and a single vCPU, running on vSphere:

wem1

Limitations for this VM have been individually configured as follows:

RAM:

wem2

CPU:

wem3

I wanted to use limitations to give a performance baseline. Although this is much lower than most Session Hosts would likely be – it will prove the concept for this test.

Next I configured the Session Host with the WEM Agent and imported the default baselines as per Citrix documentation. Within the Console, we can then see the Memory Management options:

wem4

According to the Administration Guide, this enables the following:

wem5

For the purposes of this test, I am going to set the idle limit time to 5 minutes. I will be using TestLimit, a command line tool to simulate high memory usage, available here:

https://blogs.msdn.microsoft.com/vijaysk/2012/10/26/tools-to-simulate-cpu-memory-disk-load/

I’ve configured a batch file that will start TestLimit64.exe, and consume 3.5GB of RAM (from a total of 4GB assigned to the session host).

Prior to any WEM configuration being applied, running this batch file causes Memory Usage to rise as expected:

wem6

This remains until the process is closed manually.

Next, I ran the same process but for a user logged on with active WEM Settings – including Memory Management. Initially we saw the same rise in memory:

wem7

I then waited 5 minutes (the time limit we set earlier), with the application running in the background, and then checked the stats again:

wem8

As you can see the excess memory consumed by this application has been released – and is now available to other processes running on the Session Host. I tested this multiple times on different machines and session hosts, and saw the same result each time.

This potentially very useful for situations where a single user may be using a program that runs periodically, but sits with high RAM consumption in the background. Releasing under-utilized RAM will improve the session experience in the event that RAM capacity is being reached.

 

 

Book Review – “Inside Citrix – The FlexCast Management Architecture” by Bas van Kaam

Recently I have been reading the excellent “Inside Citrix – The FlexCast Management Architecture” by Bas van Kaam. I wanted to write a quick post up about this book – as it’s well worth a read for anyone working with Citrix Desktop Virtualization products.

https://images-na.ssl-images-amazon.com/images/I/31PRVGSE6jL._SX322_BO1,204,203,200_.jpg

You can purchase the book here.

What I really like about this book is how thorough the sections are – no area is left untouched. Each element of the FlexCast infrastructure is covered, including the history behind FMA, and an overview of how FMA is different to IMA. As well as thorough details, there is also an excellent troubleshooting section, which goes through various tools and troubleshooting methods, and various cloud services available to assist.

Also, each section has a “Key Takeaways” area at the end, which provides an overview – highlighting the key elements and considerations covered. This is really useful if you are wanting to improve your knowledge in a particular area. Just by reading this book I’ve already uncovered, and filled, gaps in my own knowledge – this for me is the main reason for reading any technical publication.

Overall, for anyone working with Citrix products this book is an excellent read in my opinion – not only useful for improving your knowledge, but also serving as a reference guide when there are decisions to be made.

 

Citrix Workspace Environment Management – CPU Management

One of the great features in Citrix Workspace Environment Management (WEM) is the ability to intelligently manage CPU usage. This is especially important in a shared desktop scenario – where the actions of one user could ruin the experience for another.

To test this I am going to demonstrate a user running SuperPI (a CPU stress testing tool) and how this can be managed with WEM.

We start with a Session Host virtual machine of the following specification:

wem1

In vSphere this host is limited to 2000mhz of processing power, to provide a limit that won’t be affected by other VMs running on my host, and to provide a CPU benchmark:

wem2

Next I logged on a user before any WEM config was applied, and ran SuperPI. Note that the user is able to use all of the processing power available:

wem3

This is also confirmed by the CPU utilisation within vSphere – you can see that at the time SuperPI was started the CPU utilisation rocketed to nearly 100%:

wem4

I then tested this with a WEM Configuration applied. I started by importing the recommended default settings provided by Citrix with the Software:

wem5

With the default settings in place, CPU usage protection applies when the CPU usage goes over 25%:

wem6

It’s worth noting when I ran SuperPI without any CPU protection the rest of the sessions felt sluggish – e.g. opening windows, launching notepad took significantly longer than would be considered normal. (Because the CPU was saturated with requests from SuperPI).

Next I ran SuperPI as a user logged on with WEM Configuration applied:

wem7

What’s very noticeable now is that the CPU usage of SuperPI varies greatly (before it was a constant 99%) – when launching other applications or items, the utilisation of the CPU by SuperPi drops significantly.

I noticed ranges from 65% down to 30%. This was more noticeable when launching other applications in other sessions – these were not sluggish or slow to respond. Each application launch was accompanied with a noticeable drop in usage by SuperPI to accommodate the new process.

This is the CPU management feature of WEM controlling this application – and making the experience better for all users on the Session Host.

 

 

 

 

Quick Tip – Change SQL Server Collation

I recently needed to change the SQL Collation to allow for a System Center Configuration Manager install, but I forgot to do this when I built the SQL Server. To change this after installing SQL, run the following command:

Note: if you need to see the command output, remove the /q. Also – use with caution, I ran this on an empty SQL Server with no databases on it.

For further information see: https://msdn.microsoft.com/en-gb/library/ms179254.aspx