Policy as Code for FinOps: Improving AWS Cost Governance and Resource Tagging
Your developers just deployed a production database without proper cost tags. Finance can’t attribute the $5K/month bill. Is it dev? Testing? A forgotten project?
This happens 30–40% of the time. But it doesn’t have to.
What if you could prevent expensive infrastructure from being deployed in the first place?
Using Terraform + OPA/Rego, you can write financial policies as code that validate every infrastructure change before it launches. Two-second feedback loops. Automated cost guardrails. No surprises.
The Problem:
30–40% of AWS deployments lack proper cost-allocation tags.
Untagged resources make chargeback impossible; finance blindly allocates spend
Developers deploy expensive resources (e.g., m5.xlarge in dev, no budget limits) without governance
Post-deployment cost surprises and remediation consume engineering time and budget
Traditional monitoring (CloudWatch, billing alerts) reacts too late—damage is already done
The Solution
Implement Policy as Code for FinOps using Terraform + Open Policy Agent (OPA) + Rego.
Define financial guardrails in code, validate every Terraform plan against cost policies before resources are created, and automatically enforce mandatory cost-allocation tags.
The Impact
95%+ tagging compliance on first deployment
15–25% cost visibility improvement ($100K–$500K+ recovered)
Prevents unplanned cost spikes by blocking expensive resources in dev/staging
Enables accurate chargeback to departments, projects, and cost centers
No manual post-deployment fixes—governance baked into CI/CD
The Stack:
Terraform – Infrastructure as Code
OPA (Open Policy Agent) – Declarative policy engine
Rego – Policy language (JSON-native, logic-based)
Conftest – CLI to run Rego policies against Terraform plans
Infracost – Cost estimation (optional, powerful integration)
Why Terraform + OPA/Rego for FinOps?
Traditional Approaches Fail at Scale
Manual code reviews
“Please tag all resources” → developers forget
Tagging reviews happen post-deployment → expensive to fix
CloudWatch alerts & dashboards
React after the fact (“You spent $5K this month”)
Awareness without prevention
AWS Organizations Tag Policies
API-level enforcement (good for compliance)
Can’t prevent expensive resource types
Can’t enforce cost limits per environment or team
Why OPA/Rego is Different
Shift-left governance Validate infrastructure before deployment in CI/CD with fast feedback loops.
Cost-aware policies Write rules like:
“Deny any EC2 instance costing > $100/month in dev”
“All resources must have Environment, CostCenter, and Owner tags”
“RDS must use GP3 (not io2) in non-production”
“Total monthly cost increase must not exceed $50 per PR”
Declarative, not imperative You define what the policy is. OPA figures out how to enforce it.
deny[msg] {
resource := input.plan.resource_changes[_]
resource.type == "aws_instance"
monthly_cost := resource.change.after.monthly_cost
monthly_cost > 100
msg := sprintf(
"Instance %v costs $%v/month; limit is $100 for dev",
[resource.address, monthly_cost]
)
}
Version-controlled and auditable Policies live in Git with full history: who changed what, when, and why.
Testable Policies are tested like application code.
test_deny_expensive_instance {
result := deny with input as {
"plan": {
"resource_changes": [expensive_instance]
}
}
count(result) > 0
}
Why Not Just AWS SCPs?
SCPs Are a Safety Net — Not a Governance Layer
Every CTO asks:
“AWS already has Service Control Policies. Why not just use those?”
The short answer: SCPs and Rego solve different problems.
SCPs → reactive safety net at the AWS API layer
Rego → proactive governance in your CI/CD pipeline
SCPs can’t enforce true FinOps policies because they:
Have no awareness of pricing data
Can’t perform arithmetic or complex logic
Require hard-coded instance types (breaks with regional pricing, discounts, or custom rates)
Rego, by contrast, can reference real cost data and apply conditional logic (environment + instance type + monthly cost), making it suitable for cost-aware policy enforcement.
The Solution: Terraform + OPA/Rego
Three-layer approach:
Layer 1️⃣ : Enforce Required Tags (Week 1)
deny[msg] {
resource := input.tfplan.resource_changes[_]
resource.type == "aws_instance"
env := resource.change.after.tags.Environment
env == "dev"
instance_type := resource.change.after.instance_type
instance_type == "m5.xlarge"
msg := "Instance type m5.xlarge is not allowed in dev. Use t3.small (~$20/month)."
}
Result: Every resource deployed has proper cost-allocation tags. Finance can chargeback.
Layer 2️⃣ : Cost Limits per Environment (Week 2)
deny[msg] {
resource := input.tfplan.resource_changes[_]
resource.type == "aws_instance"
env := resource.change.after.tags.Environment
env == "dev"
instance_type := resource.change.after.instance_type
instance_type == "m5.xlarge"
msg := "Instance type m5.xlarge is not allowed in dev. Use t3.small (~$20/month)."
}
Result: Developers can't deploy expensive resources in non-production. Prevents waste.
Layer 3️⃣ : Cost Cap per PR (Week 3)
deny[msg] {
cost_increase := input.cost_estimate.monthly_cost_diff
cost_increase > 50
msg := sprintf(
"Plan increases monthly cost by $%.2f. Maximum allowed is $50.",
[cost_increase]
)
}
Result: No surprise cost spikes. Developers know budget constraints upfront.
The Aha Moment
Most teams focus on monitoring costs (CloudWatch, dashboards, alerts).
But FinOps as Code prevents costs in the first place.
It’s the difference between:
🚑 Reactive: “We spent $100K this month—optimize now.”
🛑 Proactive: “Your PR adds $30/month; the limit is $50. Approved.”
One transforms cost management. The other is just accounting.