Exploring Backups, DR, and Continuity in an Azure Terraform World

Backups, Disaster Recovery, and ensuring continuity of business data and operations, are arguably one of the most important aspects for anyone working in the IT industry. Our technically connected and data-driven world revolves around our ability to ensure our data is safe, secure, and protected from a range of issues that could prevail. In the Azure Infrastructure as Code (IAC) and Terraform world, this is no different – backups are still critical, and ensuring backups across areas like data, pipelines, code templates and modules, and state files (for example, when using Terraform) is key.

The same approach, whereby Terraform or IAC is used for supporting resources, also should be considered for environments like Citrix Cloud for example. Consider that you needed to recover to another region – but were using Terraform to build out the supporting infrastructure. In this case, having backups and alternative pipelines for deployment to another region would be necessary.

Recently there have also been some great enhancements to Azure Backup too – so this is a good chance to review our options and core aspects. You can read about how Azure Backup now supports the migration of backups from standard to enhanced backup policy. This allows an improved RPO (up to 4 hours), retention of restore points in the snapshot tier for up to 30 days, and multi-disk crash consistency for VMs. This is currently in Public Preview – with the documentation available here: https://learn.microsoft.com/en-us/azure/backup/backup-azure-vm-migrate-enhanced-policy.

In this post I wanted to give a holistic view of Backup in the IAC world (covering mainly Terraform), providing a few examples of areas to consider, resources to help, and some thoughts on approaches that can be taken to ensure adequate coverage.

Just to note…

In this post I will refer to a number of elements covered in a post I wrote last year – as this sample environment is what some of the diagrams, processes, and code are based on. You can read this article here: https://jakewalsh.co.uk/azure-deployment-using-terraform-cloud-overview-prerequisites-sample-code-and-resources/ 


Protecting all aspects of our deployment

Let’s consider a basic IAC deployment, using Terraform, to deploy and manage an Azure Environment. Even a simple deployment like this has many elements and areas that would require backups to be considered:

Key Elements that need Backup / DR / Continuity consideration in a simple IAC Deployment
Key Elements that need Backup / DR / Continuity consideration in a simple IAC Deployment

Even with this, fairly simple deployment, there are a few key elements we need to consider including in a backup:

  • Our Git Repo – whether this be in GitHub (like my example), Azure DevOps, or another platform, it is important we consider protecting code templates and modules.
  • Our Terraform State File – this is important as this contains Terraforms knowledge of our environment and deployment, without an accurate and maintained state file, our deployments will fail or encounter issues.
  • Our Deployment Pipeline – whilst this may seem like a simple option to backup, it’s still important this is protected. For simple pipelines this could be an export of a YAML file (for example in Azure DevOps), or in other platforms an export of configuration may be required.
  • Our Data within Azure – This is probably (no, actually, it definitely IS) the most critical aspect… this is our data in Azure – be it in Storage Accounts, Azure VM Disks, etc. Without this data we have no function or services – and this data would be the most critical in terms of impact during a data loss.

Backing up what matters most…

Arguably, our data is, without question, the most important aspect. Without this we have nothing, no product, no service, no value. In real terms, this most likely means protecting data within our Cloud environment, and in my case, Microsoft Azure. This therefore likely means data contained in Storage Accounts, Disks (Attached to Virtual Machines), and other areas within Azure.

Azure Backup is the most likely candidate – and the starting point for that service is Microsoft Learn, which provides documentation and guidance – https://learn.microsoft.com/en-us/azure/backup/

But what about configuring and managing Azure Backup using Terraform? 

Fortunately – this is straightforward, and using the AzureRM provider we can configure the required resources. Note – I will only summarise below, please consult the Terraform Registry (here) for full details/configurations. If you are familiar with Azure Backup – you’ll know that we need 3 core components, a Recovery Services Vault, a Backup Policy, and this Policy assigned to an appropriate resource. Note – there are some variations to this if you are backing up Storage Accounts for example, so please read the specific documentation for the Resources you are working with.

Creating a Recovery Services Vault and a basic VM Backup Policy:

In this example, we are using the Backup Policy azurerm_backup_policy_vm: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/backup_policy_vm.

# random string
resource "random_id" "vaultid" {
  byte_length = 5
  prefix      = "rsv"
}
# resource group
resource "azurerm_resource_group" "rg1" {
 name = "rg-backup-01"
 location = var.location
}
# Recovery Services Vault
resource "azurerm_recovery_services_vault" "rsv1" {
  name                = random_id.vaultid.hex
  location            = var.location
  resource_group_name = azurerm_resource_group.rg1.name
  sku                 = "Standard"

  soft_delete_enabled = true
}
# Backup Policy
resource "azurerm_backup_policy_vm" "backup-pol1" {
  name                = "terraform-example-backup-01"
  resource_group_name = azurerm_resource_group.rg1.name
  recovery_vault_name = azurerm_recovery_services_vault.rsv1.name
  policy_type = "V2"

  backup {
    frequency = "Daily"
    time      = "23:00"
  }
  retention_daily {
    count = 10
  }
}

Note, also in the above – the use of the Random Provider, this is to ensure Resource names are unique for those using my sample code. The only variable you require in this configuration is the Location.

Policy Assignment:

Assigning the Policy to a VM is also simple, using the azurerm_backup_protected_vm Resource:

resource "azurerm_backup_protected_vm" "vm1" {
  resource_group_name = azurerm_resource_group.example.name
  recovery_vault_name = azurerm_recovery_services_vault.example.name
  source_vm_id        = data.azurerm_virtual_machine.example.id
  backup_policy_id    = azurerm_backup_policy_vm.example.id
}

But what about File Shares?

File Shares can also be backed up using the azurerm_backup_policy_file_share Resource Type (https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/backup_policy_file_share). Note – for this to work you will also need to register the Storage Account with Azure Backup:

https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/backup_protected_file_share
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/backup_protected_file_share

 

# random string
resource "random_id" "vaultid2" {
  byte_length = 5
  prefix      = "rsv"
}
resource "random_id" "storage2" {
  byte_length = 5
  prefix      = "str"
}
# resource group
resource "azurerm_resource_group" "rg2" {
 name = "rg-backup-02"
 location = var.location
}
# Recovery Services Vault
resource "azurerm_recovery_services_vault" "rsv2" {
  name                = random_id.vaultid2.hex
  location            = var.location
  resource_group_name = azurerm_resource_group.rg2.name
  sku                 = "Standard"

  soft_delete_enabled = true
}
# Storage Account and Share
resource "azurerm_storage_account" "storage2" {
  name                     = random_id.storage2.hex
  location            = var.location
  resource_group_name = azurerm_resource_group.rg2.name
  account_tier             = "Standard"
  account_replication_type = "LRS"
}
resource "azurerm_storage_share" "share2" {
  name                 = "tf-demo2"
  storage_account_name = azurerm_storage_account.storage2.name
  quota                = 1
}
# Azure Backup Configuration
resource "azurerm_backup_container_storage_account" "storage2" {
  resource_group_name = azurerm_resource_group.rg2.name
  recovery_vault_name = azurerm_recovery_services_vault.rsv2.name
  storage_account_id  = azurerm_storage_account.storage2.id
}
# Backup Policy
resource "azurerm_backup_policy_file_share" "backup-pol2" {
  name                = "terraform-example-backup-02"
  resource_group_name = azurerm_resource_group.rg2.name
  recovery_vault_name = azurerm_recovery_services_vault.rsv2.name

  backup {
    frequency = "Daily"
    time      = "23:00"
  }

  retention_daily {
    count = 10
  }
}

Once we have configured the Resource azurerm_backup_container_storage_account in the above code block – you will see that the Storage Account shows as being Registered with the Recovery Services Vault.

Finally, we’d need to add this block to protect the Storage Account using our Backup Policy:

resource "azurerm_backup_protected_file_share" "storage2" {
  resource_group_name       = azurerm_resource_group.rg2.name
  recovery_vault_name       = azurerm_recovery_services_vault.rsv2.name
  source_storage_account_id = azurerm_backup_container_storage_account.storage2.storage_account_id
  source_file_share_name    = azurerm_storage_share.share2.name
  backup_policy_id          = azurerm_backup_policy_file_share.backup-pol.id
}

There are, of course, other resource types that can be protected using this method – I’d recommend visiting the HashiCorp registry to see more information on these.

Code / Modules / Templates, and the all important Pipelines!

This section goes without saying really – protecting the code we’ve used to create our environments is key. And not only is this key to allowing us to recover in a disaster scenario, it’s also critical that we consider future needs, for example, if we needed to recreate an environment, or spin up a development or test environment for example.

For most IAC deployments, a method and platform that provides Version Control will be utilised. This provides management of the code repositories, and allows control of versions when used in the most basic form. Many different platforms, systems, and providers are available when using Terraform – I’ve worked with GitHub and Azure DevOps mainly when needing a system to store IAC templates and code in a safe and secure way.

But what about Backups?

I won’t get into the debate about whether a Version Control system is a form of backup here – however, what’s clear is we need to maintain copies of our code, modules, and templates moving forward. A Version Control system provides a method to manage this, but we should also consider having a backup elsewhere if possible – so that we can restore should there be an outage or data loss with our Version Control platform. This also allows potential migration between these platforms, should the need arise.

A few additional links for reading around GitHub and Azure DevOps are included below:

Protecting the Pipelines

Also an important consideration, and something that requires perhaps a more individualised consideration (different platforms, ways of working, team structures etc.) is how we protect our Pipelines for deployment. This could be done via duplication, or an export of GitHub Actions or Azure DevOps YAML Files for example. An important consideration here is that whilst our code may be universal, in that it could be deployed using a range of platforms or tools, the YAML templates for any deployment pipelines are unlikely to be.

In this case, I would advise consideration of how these could be rebuilt – either in your current platform (so taking perhaps a template export approach) or via another platform, perhaps with supporting documentation should a rebuild be required.

Last, but by no means least – the importance of the State File!

It is not possible to discuss DR, Backups, or Continuity when using Terraform, without referring to the State File. See below for an overview of the importance of the State File:

Terraform must store state about your managed infrastructure and configuration. This state is used by Terraform to map real world resources to your configuration, keep track of metadata, and to improve performance for large infrastructures. This state is stored by default in a local file named “terraform.tfstate”, but we recommend storing it in HCP Terraform to version, encrypt, and securely share it with your team.” (https://developer.hashicorp.com/terraform/language/state)

Whilst using HCP Terraform is the recommendation, you may also wish to utilise something like Azure Storage, as a remote Backend for the State File. In which case, consider using Azure Backup, to ensure that you have at least a basic level of protection in place for the State File – so that should you encounter an issue, you have a method of restoration.


I hope this post has been useful – until next time!

Resources

Skip to content