We ran into Terraform state file corruption recently due to multiple devops engineers making applies in the same environment. Long story short; I had to manually edit the tfstate file in order to resolve the issue.
This could have been prevented if we had setup State Locking as of version 0.9. I ended up following the steps from here with changes to match our infrastructure.
Environments
We split up each environment/region into its own directory. Since global
is where we store all resources that are not environment/region specific, I will put the DynamoDB there.
environments/
├── ctl-us-east-1/
├── dev-us-east-1/
├── global/
├── prd-us-east-1/
├── stg-us-east-1/
└── uat-us-east-1/
S3
First things first, store the tfstate files in a S3 bucket. Since the bucket we use already exist (pre terraform) we will just let that be. You can always use Terraform resource to set it up.
In our global
environment, we will enable S3 storage in the backend.tf
file:
terraform {
backend "s3" {
bucket = "devops"
key = "tfstate/global"
region = "us-east-1"
encrypt = "true"
}
}
This will give us the tfstate file under s3://devops/tfstate/global
for our global
environment.
DynamoDB
Next, we need to setup DynamoDB via Terraform resource by adding the following to the backend.tf
under our global
environment.
Notice! The name = "terraform-state-lock"
which will be used in the backend.tf
file for the rest of the environments.
# dynamodb table for state file locking
resource "aws_dynamodb_table" "terraform_state_lock" {
name = "terraform-state-lock"
hash_key = "LockID"
read_capacity = 20
write_capacity = 20
server_side_encryption {
enabled = true
}
attribute {
name = "LockID"
type = "S"
}
tags {
Name = "dynamodb-terraform-state-lock-table"
Terraform = "true"
Environment = "${var.environment}"
}
}
Setup
Now that our DynamoDB resource has been created and we’re already using S3 to store the tfstate file, we can enable state locking by adding dynamodb_table = "terraform-state-lock"
line to the backend.tf
file and re-run terraform init
:
terraform {
backend "s3" {
bucket = "devops"
dynamodb_table = "terraform-state-lock"
key = "tfstate/global"
region = "us-east-1"
encrypt = "true"
}
}
# dynamodb table for state file locking
resource "aws_dynamodb_table" "terraform_state_lock" {
name = "terraform-state-lock"
hash_key = "LockID"
read_capacity = 20
write_capacity = 20
attribute {
name = "LockID"
type = "S"
}
tags {
Name = "dynamodb-terraform-state-lock-table"
Terraform = "true"
Environment = "${var.environment}"
}
}
For the rest of the environments, we just need to update the backend.tf
file to include dynamodb_table = "terraform-state-lock"
and re-run terraform init
and we’re all set!
Once you have initialized the environment/directory, you will see the local terraform.tfstate
file is pointing to the correct bucket/dynamodb_table.
$ cat .terraform/terraform.tfstate
{
"version": 3,
"serial": 1,
"lineage": "bb7-faa3-efbb-d45508d9c7bf",
"backend": {
"type": "s3",
"config": {
"bucket": "devops",
"dynamodb_table": "terraform-state-lock",
"encrypt": "true",
"key": "tfstate/global",
"region": "us-east-1"
},
"hash": 0315822878
},
"modules": [
{
"path": [
"root"
],
"outputs": {},
"resources": {},
"depends_on": []
}
]
}
Monitoring DynamoDB
Once we have everything setup, we can verify by monitoring the DynamoDB table:
$ aws dynamodb scan --table-name terraform-state-lock
{
"Items": [
{
"Digest": {
"S": "e4043a8e2d4dd4ca436317"
},
"LockID": {
"S": "devops/tfstate/dev-us-east-1-md5"
}
},
...
{
"LockID": {
"S": "devops/tfstate/prod-us-east-1"
},
"Info": {
"S": "{\"ID\":\"eb0-087e-e4e1-80e4cb52cd0f\",\"Operation\":\"OperationTypePlan\",\"Info\":\"\",\"Who\":\"cwong@CWONG-MACBOOKPRO\",\"Version\":\"0.11.10\",\"Created\":\"2018-11-02T17:44:53.644603Z\",\"Path\":\"devops/tfstate/prod-us-east-1\"}"
}
},
...
{
"Digest": {
"S": "1922eeca1cc2996662893d"
},
"LockID": {
"S": "devops/tfstate/prod-us-east-1-md5"
}
}
],
"Count": 6,
"ScannedCount": 6,
"ConsumedCapacity": null
}
ToDo
- Make the S3 bucket in terraform (we already have the bucket created long before switching to terraform)
- Setup policy (we only allow devops to run terraform and we have loads of permission by default! :P)