Starting with Terraform is easy, using it safely and effectively is hard. From working in the trenches, these are my tips to make life easier and avoid disaster. OK a little dramatic, but you get the point. 😂
Opinionated modules are good, but don't over do it
The most immediate benefit of Terraform is the productivity gains attained through automation. However, after this, other business benefits begin to emerge, for example, beginning to enforce desired architectural patterns onto developers through the use of modules.
Terraform allows you to create modules of any size and complexity, they can be 'unopinionated' - not much more than a wrapper, to the opposite end of the spectrum, there are opinionated modules which will provide a very rigid model of what an architecture should be. There is no hard and fast rule on this, but when designing opinionated modules, careful thought needs to go into their design to ensure they don't restrict future uses cases, and provide as simple interface as possible. In my experience, the lower down the stack (i.e VPC / network layer), the more care is required - it makes sense, this is the foundation of your architecture, a misstep here will ripple up your whole solution.
Opinionated modules are good to enforce architectural patterns and repeatability, but avoid designing modules that are too inflexible and will potentially end up being bypassed or creating a development hostile environment.
Subnet magic - be wary of where modules take a single CIDR and allocate ranges automatically. Subnet sizes will vary depending on your solution, and different subnet will need to potentially have quite specialised routes - you will be overriding / duplicating so much that any benefit will likely be negated. If enforcement of NACLs or Route Tables is a must have, consider a check as part of your pipeline, for example using OPA or Sentinel if you're a Terraform Enterprise customer.
Input variable bloat - more than a few dozen input variables and there is a code smell here. Are you providing the same information over and over - if so why isn't the module looking this up itself, using deterministic information i.e tag values?
I'm a big fan of the AWS Terraform Registry modules, in particular vpc - it has a great balance of encapsulation, but not overstepping the mark.
Architect your Terraform projects for maintainability
Especially when working in a multi-team environment, the structure of your terraform projects and how they link with each other is critical.
When first learning terraform it is tempting to build a vertically scoped project. This might include VPCs, IAM roles, all the way up to EC2 and Lambdas. Creating this style of project is easy, but makes team collaboration markedly more difficult, and in the future might lead to code duplication and inflexibility. Instead consider breaking your terraform 'landscape' into layers;
The lower layers are less tolerant of change (one change could ripple up the whole stack), but if they are separated they are likely to see very little change compared to layers above. Breaking into these layers will create a safer landscape for your environment and reduce inter-dependencies within a terraform project. In theory, it should mean that;
- It's easier to restrict different operational concerns - for example, application developers should have no need to change or deploy repositories relating to layers below them,
- Keeping fundamental, low churn items like VPCs and Transit Gateways separate will protect them from accidental breaking changes made further up the stack,
- High assurance environments will require separation of duties, creating layers like this creates natural boundaries to satisfy compliance concerns.
Keep it DRY
Don't Repeat Yourself, you will hear it a lot in any development circle (reminiscing time - I first head it when learning Ruby on Rails many years ago 😅). In the world of terraform, there is no 'rails' to define a good project structure (hence this article), so how you avoid repetition is very much down to you. A few anti-patterns I would watch for though...
Duplicate Frontends across a project
At first, this looks like a well structured project, however, looking a bit further, whatever the contents are of
database.tf, this could lead to a lot of repetition. At best, these are frontend
modules calling to common modules, at worst, they're not. If you need to change a resource in one environment, each environment will need updating in turn - slow and error-prone. Instead consider;
Creating a common terraform directory and separate
tfvars cuts down on duplication a lot. Bleeding edge separation can be achieved via deploying from branches into your Dev / QA / Prod environments as needed. Need to disable a development service in Production? Create feature flags within modules to control their behaviour.
You can't use
count on a module I hear you cry - look at the code for the
create_vpc feature toggle in the Terraform VPC Registry module, it works and it feels cleaner than count as well.
A few tips to help keep your custom modules DRY;
- Reduce mental load on your fellow developers, keep the required variables clear, minimal and relevant,
- Use defaults for variables within your module where possible to reduce the 'interface' size of your module,
- Use typing and useful descriptions for variables to reduce input error,
- Reduce duplication across tfvars, values consistent across environments should be declared in an
auto.tfvarsfile, with environment specific tfvars being referenced as part of the
Terraform Poka-Yokes - a great way of thinking about writing 'safe' terraform, allowing different layers of consumption. This allows enforcement of architectural patterns while providing flexibility.
Terraform Environment+Application Pattern - A grat deep dive that explores the layered approach to creating a terraform adoption, however, my personal preference would disagree with some of the implementation choices (i.e Consul for sharing state information across layers). I would also consider how DRY the approach to using environment specific front-end modules is, compared to feature flagging via tfvars or using a tool like terragrunt.
Hashicorp has opinions on how their tools should be used, follow them
Read the issue trackers for Terraform on GitHub and a lot of language 'feature requests' are met with a firm but well thought out,
won't fix. They have a clear view of what their language should allow you to do, and what it shouldn't.
I won't win many fans here, but I believe this is a good thing™. Most of the concerns around are creating modules with more dynamic on / off functionality - if you're hitting language barriers, consider is it an issue with your project / module structure? I know shocking!
If your code seems to going round in loops, consider breaking it apart. One monster sized project to deploy everything will cause you more pain than any technical street cred.
Module Composition is a great resource that should be read and digested.
Reading the commentary in terraform's GitHub issue repository, for example, can also be a goldmine.
Fail to design, plan to fail (no pun intended)
Starting Terraform can be a little too easy. You can get going straight away and for quite a while everything will be 🌈 and 🦄.
However, failing to think through what you will be using terraform for and who is likely to be using it will most likely put you into a sticky situation in the future. Try to understand some of the fundamentals of your infrastructure and development process before breaking ground. Think about;
- In the case of AWS, will you have a multi account setup, or flat structure? (this will influence how you configure your state buckets and provider config),
- Do you have seperate DevOps teams who look after different parts of your infrastructure? (even if you don't, layering is still a very sensible idea)
- How will you compartmentalise your application and different environments? (VPCs, Subnets, Accounts)
- How will you create a reliable and secure set of roles for those who need to make changes via terraform? (Maybe - read only console, deploy rights only provided to Terraform runner i.e Atlantis, Jenkins, Terraform Enterprise)
- How will you connect your VPCs? (Make allowance for Transit Gateway attachment and VPC endpoints - a dedicated subnet makes this clean)
- Do you have a common tagging policy? (you might want to push this to a common module)
- How will you execute your terraform projects? (From within a CI system or a shared host)
Keep it consistent
Once you have a good overview of how and who will be using terraform, develop a terraform style and usage guide, as the basis of keeping things consistent. You may want to cover;
- How projects will be decomposed,
- Naming conventions,
- vars.tf ordering (locals, variables, variables with defaults for example),
- Which Terraform Registry modules to use, which common modules to create in-house,
- Branching and Deployment strategy,
- State bucket location & configuration.
The theme here is get a common understanding of how terraform should be used early on, and avoid technical debt. Refactoring terraform is possible, especially with the use of
terraform import, but can get dicey, mighty quick!
What's your terraform war story? Share below for all to learn from - for now, that's it from me!