Overview
With the rapid development of AI Services in recent times, it’s clear that adoption and growth of these services is only going to continue – and see wider adoption across a range of organisations. Continuing the Azure AI Infrastructure as Code Learning Journey, is therefore likely to be a key activity for many within 2024 and beyond. With Azure AI Services, there is a clear focus on providing AI Services that can be integrated into business applications and workloads via standardised SDKs and APIs, and Studios that make development and testing easier.
If you’ve yet to try out the Azure AI Studio – you can access it here: https://ai.azure.com/
Whilst tools like the Azure AI Studio help development and testing, there is still a need for Infrastructure as Code (IAC) in my view – with IAC providing an overlay that allows the development of AI Services and supporting Infrastructure in a programmatic and repeatable way. Repeatability, replicability, and a code-based approach to deployment of these types of services and solutions leads to a more streamlined experience when deploying into production or developing and testing new features too.
Azure AI and Terraform was the subject of a Presentation I recently gave alongside fellow MVP Nicholas Chang at the Global Azure event – you can download a copy of those slides here.
How can IAC help Organisations adopt AI?
I’ve blogged and spoken about the benefits of IAC numerous times, and in particular focusing on Terraform, which is an IAC tool I regularly use. However, the benefits are similar across a range of tools when considering Azure and OpenAI Services – for example, Azure Bicep also provides the ability to deploy Azure Resources, including those within the Azure AI and OpenAI Space. It’s clear that Infrastructure as Code and AI are likely to be seen together in many organisations moving forward.
The comparison of tooling – for example Terraform vs. Bicep is not a subject for this post, but is an area that usually sparks debate and discussion, but in my view largely comes down to the specific use cases, goals, and wider requirements.
Please note – the below is an overview, and is not exhaustive, this post could be a huge novel or very long discussion due to the nature of IAC and varying opinion and use case!
How IAC Helps Technically…
What is much less a subject for debate, and more clear cut in my opinion, is the huge range of benefits that IAC tooling can provide for AI Service Development – that is deploying the actual technical components of the solution. When I use the phrase “AI Service Development”, what I really mean is activities like these:
-
Deployment of AI Models and Services – including for temporary usage, testing, etc.
-
Creation of Keys, Endpoints, Private Networking.
-
Securing AI Resources – e.g. Gateways, Private Networking, Firewalling.
-
Deployment of supporting resources or additional capability – for example a Storage Account.
Consider the below, which is a very simple deployment overview diagram – showing a potential IAC deployment of AI Services:
Within the above example, we have a few things taking place:
- Terraform is used for our deployment:
-
This could be using locally stored code, or it could be in a remote repo using tooling like GitHub or Azure DevOps.
-
The deployment could be done locally, or via a Pipeline based approach, again using GitHub Actions or Azure DevOps Pipelines for example.
-
- Terraform is used to then deploy a number of items:
-
An Azure AI or Azure OpenAI Model and associated configuration – e.g. Private Networking.
-
Supporting Azure Infrastructure – e.g. Application Gateways, Firewalling, Networks.
-
Application Infrastructure – e.g. App Service Plans and Database Services
-
- Our applications, AI resources, and supporting resources have been deployed by Terraform – and importantly, as a centralised deployment, bringing the AI resources into the same deployment pipeline or process as the rest of the Azure resources.
I’ve used the above method quite a bit with a simple Azure OpenAI demo lab I use – allowing me to deploy/destroy as required:
# Cognitive Services - without Private Networking resource "azurerm_cognitive_account" "cognitive1" { count = var.privatenetworking ? 0 : 1 name = "oai-${random_id.cognitive.hex}" location = azurerm_resource_group.rg1.location resource_group_name = azurerm_resource_group.rg1.name sku_name = "S0" kind = "OpenAI" } # Cognitive Deployments - without Private Networking resource "azurerm_cognitive_deployment" "gpt-35-turbo" { count = var.privatenetworking ? 0 : 1 name = "gpt-35-turbo" cognitive_account_id = azurerm_cognitive_account.cognitive1[0].id model { format = "OpenAI" name = "gpt-35-turbo" } scale { type = "Standard" } } resource "azurerm_cognitive_deployment" "text-embedding-ada-002" { count = var.privatenetworking ? 0 : 1 name = "text-embedding-ada-002" cognitive_account_id = azurerm_cognitive_account.cognitive1[0].id model { format = "OpenAI" name = "text-embedding-ada-002" } scale { type = "Standard" } }
You can see more of this lab environment here: https://github.com/jakewalsh90/Terraform-Azure/tree/main/Azure-OpenAI-Demo-1
But what about helping Organisationally?
From a organisational perspective, the benefits of IAC are also significant – from developing in repeatable, version controllable, templated ways, to the speed, cost, and risk reductions that IAC provides. I’ve shared the image below many times – to illustrate that many of the benefits that come with IAC are cyclical in nature. By this I mean that as you develop in IAC practices, and usually DevOps methodologies and principals alongside, you find the speed of adoption increases rapidly.
Aside from the obvious benefits shown above in an example of “adopting IAC”, the other helpful aspect of IAC, and Terraform in particular, is the ability to utilise modules to enable standardised development and deployment. For example, there is an Azure OpenAI Terraform Module that enables deployment of OpenAI Resources: https://registry.terraform.io/modules/Azure/openai/azurerm/latest.
Developing internal modules, and utilising publicly available ones also helps organisations adopt with speed – using internal modules where customisation is required, or external modules where a more standard, and externally maintained, approach is viable.
Resources and Further Reading
- What is Responsible AI? https://learn.microsoft.com/en-us/azure/machine-learning/concept-responsible-ai?view=azureml-api-2
- Terraform Azure OpenAI Deployment – https://github.com/jakewalsh90/Terraform-Azure/tree/main/Azure-OpenAI-Demo-1
- Deploying, Securing, and Monitoring Azure OpenAI with Terraform – https://jakewalsh.co.uk/deploying-securing-and-monitoring-azure-openai-with-terraform/
- Azure OpenAI Landing Zone Reference Architecture – https://techcommunity.microsoft.com/t5/azure-architecture-blog/azure-openai-landing-zone-reference-architecture/ba-p/3882102
- Request access to Azure OpenAI – https://aka.ms/oai/access