Terraform State Separation

Draft

Table of Contents

Problem Statement

Having a single Terraform directory and state to manage both development (dev) and production (prod) environments is generally considered bad practice for several reasons:

  • This can lead to accidental changes in the production environment when making updates intended only for the development environment. This increases the risk of outages or undesired behavior in prod.
  • Keeping environments separate promotes better organization and clarity. It makes it easier to manage changes specific to each environment without introducing complexities.
  • With a single state file, changes intended for one environment might introduce conflicts or dependencies that affect the other environment. Separate state files prevent this issue and allow for more granular control over infrastructure changes.
  • If both environments share a state file, work on one environment must halt while the state is locked by operations on the other, leading to potential delays and coordination issues among team members.
  • In case of an incident, having separate states allows you to roll back changes or recover environments independently. This isolation can be critical in preventing issues from propagating between environments.
  • Using separate state files facilitates the application of environment-specific policies and access controls, ensuring alignment with those requirements.
  • It becomes easier to introduce environment-specific features without complicating the overall setup.

Many CI/CD practices depend on deploying to environments independently. Separate configurations and states fit naturally with these patterns, allowing seamless integration into your deployment workflows.

Separation

A common practice when maturing Terraform setups — splitting environments into separate states. It can be done safely if you take a careful, step-by-step approach. The key is to avoid Terraform accidentally destroying or recreating resources during the migration.

The safest approach:

  • Copy the current directory, create dev and prod directories.
  • Configure two separate remote backends (one for dev, one for prod).
  • Use terraform state mv to move resources between state files without recreating them.
  • Verify with terraform plan that no unintended changes happen.

Remote backend configuration:

terraform {
  backend "azurerm" {
    resource_group_name   = "tfstate-rg"
    storage_account_name  = "tfstateaccount"
    container_name        = "dev-tfstate"
    key                   = "terraform.tfstate"
  }
}

Inspect

Create current state file:

terraform state pull > current.tfstate

To see what Terraform is currently tracking in the state, you have a few built-in commands.

terraform state list
terraform state show <resource>

You can open and inspect entire state file:

jq '.resources[].type' current.tfstate

You can visualize dependencies with:

sudo apt install graphviz
terraform graph | dot -Tpng > graph.png

Backup & Restore Instructions

Ensure you back up your current state and understand the restoration process from this backup file before proceeding.

State backup:

mkdir -p ./backup
terraform state pull > ./backup/before-split.tfstate

Azure example: Check your backend "azurerm" configuration in main.tf (or wherever you configured it).

terraform {
  backend "azurerm" {
    resource_group_name   = "my-rg"
    storage_account_name  = "mystorageacct"
    container_name        = "tfstate"
    key                  = "prod.terraform.tfstate"
  }
}

Upload the state file into Azure storage container:

az storage blob upload \
  --account-name mystorageacct \
  --container-name tfstate \
  --name prod.terraform.tfstate \
  --file ./backup/before-split.tfstate \
  --overwrite

After restoring, re-run terraform init to reconfigure and confirm Terraform is pointing to the remote state.

terraform init -reconfigure
terraform state list

Another way if anything catastrophic happens, you can always restore:

terraform state push ./backup/before-split.tfstate

Preparations

Copy your Terraform configuration to a new directory for dev:

cd terraform
mkdir -p envs/dev
cd ./envs/dev/

Configure a new backend in backend.tf file:

terraform {
  backend "azurerm" {
    resource_group_name  = "rg-terraform-state"
    storage_account_name = "tfstatestorage"
    container_name       = "tfstate-dev"
    key                 = "terraform.tfstate"
  }
}

Initialize it:

terraform init

Safe Workflow for Incremental Migration

When you run:

terraform state push terraform-migrated.tfstate

It is not an append — it’s a full overwrite!

Terraform will replace the entire remote state in the backend with the contents of your local terraform.tfstate file.

If you want to move resources one by one and push after each move, you must make sure you always start from the latest remote state before adding the next resource.

Pull the latest remote dev state from envs/dev directory:

terraform state pull > ./terraform-migrated.tfstate

Go back to the shared infrastructure and move the next resource from the shared state:

cd ../../
terraform state mv \
  -state-out=./envs/dev/terraform-migrated.tfstate \
  azurerm_network_interface.tfstate_migration_experimental_dev_nic \
  azurerm_network_interface.tfstate_migration_experimental_dev_nic

Push the updated file back to the dev backend from envs/dev directory:

cd ./envs/dev
terraform state push terraform-migrated.tfstate

Moving Resource

From the ./envs/dev directory make a new state pull:

terraform state pull > ./terraform-migrated.tfstate

From the root shared state directory:

terraform state mv \
  -state-out=./envs/dev/terraform-migrated.tfstate \
  azurerm_network_interface.tfstate_migration_experimental_dev_nic \
  azurerm_network_interface.tfstate_migration_experimental_dev_nic

Output should be like this:

Acquiring state lock. This may take a few moments...
Move "azurerm_network_interface.tfstate_migration_experimental_dev_nic" to "azurerm_network_interface.tfstate_migration_experimental_dev_nic"
Successfully moved 1 object(s).

Or if you need to move the resource between local files:

terraform state mv \
  -state=./temporary.tfstate \
  -state-out=./final.tfstate \
  azurerm_network_interface.tfstate_migration_experimental_dev_nic \
  azurerm_network_interface.tfstate_migration_experimental_dev_nic

Push updated state to remote backend:

terraform state push terraform-migrated.tfstate

WARNING: Remember to move the resource declaration code when relocating the resource state.

Data Blocks

Data blocks in Terraform are data sources, not managed infrastructure. They do not create or own anything. They are just "lookups" that query Azure for existing resources. The state only records the last known results of that lookup (IDs, names, attributes) so Terraform doesn’t have to re-fetch every time.

Do not move data.* entries, you can safely leave them behind. Move only resource entries.

Outputs

After moving resources from shared state to a new environment state, Terraform will show Changes to Outputs with a lots of new objects. Since outputs are part of state, Terraform sees them as new (+) even if the underlying infrastructure already exists.

After you run terraform apply, Terraform will populate the outputs in the new state by querying the actual infrastructure. The values will be exactly the same as before as long as your code and infra haven’t changed. Outputs are purely metadata in the state file. The values will match what you had in shared state because they’re derived from the same resources.

Random Strings

A random_* resource like random_string or random_password generates a value once and stores it in the Terraform state. On subsequent runs, Terraform simply reuses the stored value. If you lose that state entry or recreate the resource, Terraform will generate a different random value.

When the value would change:

  • If you forgot to move it and let Terraform think it’s missing, it will try to create a new random_string. You get a different suffix.
  • If you did a terraform import instead of state mv, that won’t work for random_*, they aren’t real Azure resources to import.
  • If you accidentally removed it from state with terraform state rm, the next apply would create a new one.

If moved with terraform state mv, the random string stays exactly the same.

Moving Resource Back

Let’s say you moved azurerm_resource_group.dev from the old shared state to the new dev state but something went wrong - plan shows weird changes, configuration mismatch, or you realize you picked the wrong resource.

You can just reverse the operation from the ./envs/dev directory:

terraform state mv \
  -state-out=../../terraform.tfstate \
  azurerm_resource_group.dev \
  azurerm_resource_group.dev

Push Updated State

After moving resources you should push local state to the remote storage container.

terraform state push terraform-migrated.tfstate

Normally output should look like this:

Acquiring state lock. This may take a few moments...
Releasing state lock. This may take a few moments...

Make new backup each iteration.

Changing Remote State Backend Configuration

Follow these instructions if you need to change storage account or storage container.

Lets assume your infrastructure code is located in current directory:

terraform state pull > backup.tfstate

Create separate environment directory, for example prod.

mkdir -p ./envs/prod/
cd ./envs/prod/

Create backend.tf file:

terraform {
  backend "azurerm" {
    resource_group_name   = "tfstate-rg"
    storage_account_name  = "tfstateaccount"
    container_name        = "prod-tfstate"
    key                   = "terraform.tfstate"
  }
}

Initialize prod directory:

terraform init -reconfigure

Push current state backup into prod backend:

terraform state push ../../backup.tfstate

Copy infrastructure code to the new directory, except backend.tf file.

Update the locked dependency selections to match a changed configuration:

terraform init -upgrade

Verify:

terraform plan

Expected output:

No changes. Your infrastructure matches the configuration.

Clean Up

If all resources have been moved with terraform state mv, then your dev state is already fully populated with the real infra. Running terraform plan in envs/dev should show no changes except possibly new outputs being added. In this case, there is no need to apply before cleaning shared code, because you’re not missing anything in state.

If you see any + resources (not outputs) in envs/dev plan, that means you didn’t migrate some resources properly — Terraform thinks it has to create them. You should not clean shared code until that’s resolved. Otherwise, Terraform might actually try to recreate something.

Outputs safe to ignore — they’ll just repopulate values from existing infra. No infra changes occur.

Now clean up all HCL code related with moved resources from the source directory.

September 29, 2025