In this post I wanted to summarise some of my recent activities deploying, securing, and monitoring Azure OpenAI with Terraform. This is an area I am regularly discussing with Clients – talking about possibilities, delivering demos, and looking at different options to integrate the range of Azure AI and Azure OpenAI Services into business applications and processes. I’ve created my own basic sample environment to enable this – and you can download a copy of that code below.
Note – the below sections and areas and not exhaustive, and aim to highlight some of the areas I’ve seen and a few helpful tips to get started! A starting point for this post is a demo/sample environment I have created – available here: https://github.com/jakewalsh90/Terraform-Azure/tree/main/Azure-OpenAI-Demo-1
I’ve also included a number of links in the Resources section of this post, with links to other useful sites and Technical Community posts/pages.
Deploying Azure OpenAI resources with Terraform
Deployment of Azure OpenAI resources is a straightforward process, and uses the AzureRM Provider. To deploy an Azure OpenAI resource, you require a Cognitive Services resource – in the example below, I am randomising the naming using random_id (from the Random Provider):
# Random IDs for OAI Resources resource "random_id" "cognitive" { byte_length = 6 } # Resource Group resource "azurerm_resource_group" "rg1" { name = "rg-${var.region}-aoai" location = var.region } # Cognitive Services resource "azurerm_cognitive_account" "cognitive1" { name = "oai-${random_id.cognitive.hex}" location = azurerm_resource_group.rg1.location resource_group_name = azurerm_resource_group.rg1.name sku_name = "S0" kind = "OpenAI" }
Why so Random?! I’ve been asked before – why the use of naming randomisation? I like to build this into naming where possible and appropriate – it means that repeatable lab and/or demo resources can be deployed repeatedly without worries of conflicting naming, or challenges if multiple people deploy the code, or in the same Subscription for example. Some resources in Azure also require unique resource names – so I like to include this as standard in most of the environments I create, for ease of learning/use.
When deployed, this creates an Azure OpenAI Resource like the one shown below:
Tip – if you are looking for a tool to help with resource naming, check out the Azure Naming Tool.
We also need to consider our models – which are deployed using the azurerm_cognitive_deployment resource type. See the example deployment of a gpt-35-turbo model below:
# Cognitive Deployments resource "azurerm_cognitive_deployment" "gpt-35-turbo" { name = "gpt-35-turbo" cognitive_account_id = azurerm_cognitive_account.cognitive1.id model { format = "OpenAI" name = "gpt-35-turbo" } scale { type = "Standard" } }
We’ve now covered deployment of a Service and Model at a high level with Terraform – and can move on to security, and monitoring.
Providing a foundation for, and securing Azure OpenAI resources
I recently read two great articles on Microsoft Tech Community that covered a range of key considerations around security OpenAI resources within Azure:
-
Security Best Practices for GenAI Applications (OpenAI) in Azure – https://techcommunity.microsoft.com/t5/azure-architecture-blog/security-best-practices-for-genai-applications-openai-in-azure/ba-p/4027885
-
Azure OpenAI Landing Zone reference architecture – https://techcommunity.microsoft.com/t5/azure-architecture-blog/azure-openai-landing-zone-reference-architecture/ba-p/3882102
Both of these articles are a great read – and highlight a range of areas that require consideration when securing, and providing a foundation for Azure OpenAI resources. The Azure OpenAI Landing Zone reference architecture helps to cement key recommendations – and provide a baseline to grow with.
In the below sections, I wanted to call out a few specific examples within the above articles, and demonstrate the ease with which these can be actioned when using Azure OpenAI with Terraform. Note – these are not exhaustive, and just a few of the ones that I think are likely to be key!
Private networking, and network security
Probably the most important aspect for securing access to PAAS (Platform as a Service) resources, is private network connectivity, providing a security layer that prevents unwanted ingress, and controls egress from PAAS resources, like Azure OpenAI resources. With private networking configured, it allows you to control network access to Azure OpenAI resources with additional controls, like Azure Firewall, for example.
The starting point for Private networking for Azure OpenAI Services is most likely to be via Private Endpoints or Private Link. The below overview is from Microsoft Learn:
-
Azure Private Endpoint: Azure Private Endpoint is a network interface that connects you privately and securely to a service powered by Azure Private Link. You can use Private Endpoints to connect to an Azure PaaS service that supports Private Link or to your own Private Link Service.
-
Azure Private Link Service: Azure Private Link service is a service created by a service provider. Currently, a Private Link service can be attached to the frontend IP configuration of a Standard Load Balancer.
Azure Private Link frequently asked questions (FAQ) | Microsoft Learn
Within the azurerm_cognitive_services_account we can configure the network_acls and virtual_network_rules blocks to assist in securing the Azure OpenAI Resource:
For more information on this – see here: azurerm_cognitive_account | Resources | hashicorp/azurerm | Terraform | Terraform Registry
For more information on this – see here: azurerm_cognitive_account | Resources | hashicorp/azurerm | Terraform | Terraform Registry
In my lab example, if you adjust the privatenetworking variable to true, the network_acls block is used to secure the OpenAI Resource:
# Cognitive Services - with Private Networking resource "azurerm_cognitive_account" "pn-cognitive1" { count = var.privatenetworking ? 1 : 0 name = "oai-${random_id.pn-cognitive.hex}" location = azurerm_resource_group.rg1.location resource_group_name = azurerm_resource_group.rg1.name sku_name = "S0" kind = "OpenAI" custom_subdomain_name = "oai-${random_id.pn-cognitive.hex}" network_acls { default_action = "Deny" virtual_network_rules { subnet_id = azurerm_subnet.subnet1-aoai[0].id } } }
API Management & Application Gateway
API Management and Application Gateways provide an enhancement layer that sits between requests, and your Azure OpenAI resources. By implementing these types of solutions you can control the how your resources can be interacted with. Using API Management allows you to manage the requests to your resources, including those that may be upstream – e.g. Web Applications for example, by implementing controls like quotas or rate limiting.
You can read more about configuring API Management protection features here: https://learn.microsoft.com/en-gb/azure/api-management/transform-api
Configuration of API Management and Gateway Features is done via the azurerm_api_management resource within Terraform – and requires configuration and customisation based on your use case, application, and end user requirements.
For more information on this – see here: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/api_management
Managed identities
Managed Identities provide identities for Azure Resources – so they allow Resources to access other Resources using an identity, with the usual RBAC and Permissions within Azure, but without the need to create and manage Resource or Service Accounts for example. Managed identities are managed by Entra ID, and include elements like credential management, and rotation. They they are therefore a great option when you need this type of access – as it allows the removal of static credentials, and centralises the management of them.
In most cases, for Managed Identities that are used with Azure resources, System Assigned Managed identities will be preferable, as these are tied to the lifecycle of the Azure Resource in question – so they are deleted alongside a Resource that is also deleted, for example. You can read more about Managed identities here: https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview.
When securing components of an Azure OpenAI Application or Service (e.g the various components interacting with an Azure OpenAI Resource), Managed Identities are a key way to ensure that relevant components can communicate and access other services, without the need to create and manage a range of accounts. By using Managed Identities, these are automatically managed for you, by Entra ID – and provides an approach that is both simplified, and more secure. You may find that depending on the architecture of the specific application or service, that API keys play a more key role in securing access, however, Managed Identities also have a role to play where Azure Resources are concerned.
Working with Managed Identities in Terraform presents a simple way to build these into the lifecycle of resources – and allows creation/update/destruction to also be managed within existing Terraform Pipelines or Processes. A Managed Identity Resource, along with RBAC can be created using the below example block:
data "azurerm_subscription" "primary" {} resource "azurerm_role_assignment" "example_managed_identity" { scope = data.azurerm_subscription.primary.id role_definition_name = "Contributor" principal_id = azurerm_resource_need_this_assigned_to.resource_name.identity.0.principal_id }
Note – the above azurerm_resource type and name will need updating to your specific use case. In many cases granting Contributor to a whole Subscription would also be unwise – and you’ll likely want to adjust the permission level and scope to individual resources or Resource Groups.
You can read more about using Managed Identities within Terraform here: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/role_assignment.
Monitoring Azure OpenAI Resources
It goes without saying that monitoring is probably the most critical factor when deploying any type of IT Service or Infrastructure. Without monitoring data and analysis you simply can’t understand the performance, reliability, health or security (to name just a few!) aspects of the service. Azure OpenAI Services are no different – and we should also consider the fact that in many cases, we’d also be monitoring additional components that sit alongside the Azure OpenAI Services – by taking a holistic approach that monitors end to end.
Considering this fact, that many of the Azure OpenAI Services are unlikely to be deployed in isolation, we need to consider monitoring as holistic, across both specific elements, and our services as a whole. Although we can just monitor singular elements – this won’t help track the overall health and wellbeing of our services, nor will it allow us to plan for capacity changes for example, across other elements of the infrastructure.
For Azure OpenAI Services – there are specific metrics that we can monitor, see here: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/monitoring#azure-openai-metrics. For other Azure services – refer to the specific service documentation about what can be monitored. For a general overview of Azure Monitor Metrics – see here: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/data-platform-metrics#types-of-metrics
What about the Terraform?
Within our Monitoring environment – there are a number of core aspects and components that we need in place to ensure suitable monitoring for an Azure OpenAI environment. We have a number of options, and most are formed around one of the following 4 areas (with these areas being those we can use with Azure OpenAI):
-
Log Analytics Workspace – https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/log_analytics_workspace
-
Storage Account – https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/storage_account
-
Event Hub – https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/eventhub
-
Partner Solution – (This will depend on the solution in question)
In many cases – we would need to follow the high level process/steps below to deploy our resources and monitoring solution via Terraform. It’s likely all of the below would be deployed in a single apply phase (or as part of an implementation phase), however this could be split out if desired during development and testing:
-
Deploy infrastructure – for example; Azure OpenAI resources, networking, private connectivity, Azure App Service, etc.
-
Deploy Monitoring infrastructure – Log Analytics Workspaces, Storage Accounts, Event Hubs etc.
-
Configure infrastructure and monitoring – ensuring that resources are configured to pass metrics to the relevant monitoring resource.
-
Deploy dashboards and alerting – ensuring we can view, analyse, and act upon our monitoring data.
Whilst the above steps are a generic high level plan, the reality is that monitoring is a sliding scale – it varies between deployments, applications, services, and organisations. With this in mind it’s challenging to define a baseline for monitoring that fits every use case. However, there are resources out there to help:
-
Azure Monitor Documentation – https://learn.microsoft.com/en-us/azure/azure-monitor/
-
Azure OpenAI Dashboards – https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/monitoring#dashboards
-
Azure OpenAI Insights: Monitoring AI with Confidence – https://techcommunity.microsoft.com/t5/fasttrack-for-azure/azure-openai-insights-monitoring-ai-with-confidence/ba-p/4026850
I hope this post has been helpful – if you have questions or comments, please do feel free to reach out or use my contact page. Until next time!
-
Azure OpenAI Landing Zone Reference – https://techcommunity.microsoft.com/t5/azure-architecture-blog/azure-openai-landing-zone-reference-architecture/ba-p/3882102
-
HashiCorp Terraform Registry – AzureRM Provider, Cognitive Account: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/cognitive_account
-
Security Best Practices for GenAI Applications (OpenAI) in Azure – https://techcommunity.microsoft.com/t5/azure-architecture-blog/security-best-practices-for-genai-applications-openai-in-azure/ba-p/4027885
-
Azure OpenAI Landing Zone reference architecture – https://techcommunity.microsoft.com/t5/azure-architecture-blog/azure-openai-landing-zone-reference-architecture/ba-p/3882102
-
What is Azure Private Link service? – https://learn.microsoft.com/en-us/azure/private-link/private-link-service-overview
-
What is a Private Endpoint? – https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-overview