One month with Terraform

Disclaimer: this is my cosplay on post based only on my personal experience.

Getting started with AWS administration, I immediately got an opinion about AWS console. It has such a beautiful design, two-click solutions and declarative prompts. But the more you’r on Amazon, the more you get: it’s not enough for automated infrastructure management. AWS CLI is great and huge-provisioned, but slow. Programming languages SDK are much faster, but complicated to manage the general infrastructure state. Also there should be a clear solution of AWS infrastructure backup. Amazon provides a CloudFormation. In ideal world with unicorns you can get up your capabilities from simple JSON schema. But in practice CloudFormation creates a lot of unrecognizable stuff. And also it’s not for existed infrastructure backup.

Who will solve all described issues? Terraform will do! One more thanks to Mitchell Hashimoto and his team for this product. It’s still fresh, but potentially looks like Swiss army knife for AWS administration. Moreover, it plans to deal with another cloud and non-cloud providers (hello, VMware). But there is another story.

Introducing Terraform in my project during one month, I had a plenty of mixed thoughts. Here I want to describe them.

AWS resources conversion to Terraform code

With distributed architecture even small projects have hundreds of resources. Even you use almost nothing but EC2, you still have to deal with IAM, network and DNS.

It’s too terrible to manually describe all leveraged items. Here I start to work with Terraforming. It’s Ruby utility which helps to convert your capabilities into tf code. Also it automatically generates tfstate file for each of service collections.

My first action was: Terraforming all I can, add all I can’t convert manually and test it together. I spent about 2 weeks (not full-time) to figure out what I have and what I have to add in IAAS. Writing all resources I tried to check complete picture. Oh, how was my head when I got 165 errors!

165, my friend! No, I’m lying a little bit. One time it was 157, one more time it was 178, after 143 and so on. I thought they put in Golang simple random numbers generator which kicks your ass! But then I looked back into one root wisdom in IT:

“Technologies are not a problem – people are”

Understanding I’m done with it I made the plan B:

  1. Terraforming service collection one-by one. For example, convert the network components at first. Later EC2 instances, later ELB and so on.
  2. Compare what you have in .tf code and what you have in .tfstate file. If something’s wrong – fix it.
  3. Verify .tf and .tfsate files have identical state configuration.
  4. Add some parametrization to your items. Put some variables in configuration instead of having bare items ID. Just put all you already have in Terraform. For example, if you configured security groups and now working on EC2 – put variables on instances SG configuration.
  5. Verify you didn’t break anything after parametrization and go Terraforming the next collection. It’s a loop to proceed through 1-5 steps.
  6. After Terraforming figure out what you still don’t have. Add it manually in .tf file.
  7. Also think over how to add manually written resources into .tfsate to see what you have in AWS. The good action – create simple resource by Terraform and grab .tfsate changing item ID. Don’t change all settings – Terraform will automatically change them after the next refresh.
  8. Sounds terrible, but let’s take an example. You want to add Cloudfront distribution in TF but don’t know how does it look like in .tfsate configuration. Just create new distribution, copy its content from refreshed .tfstate and paste it into tfstate you work on. After that change the item ID to tell Terraform what you need.
  9. Verify your code and state are equal and go adding other resources. There is another loop through 6-8 steps.
  10. Completing all resources, you will get the full infrastructure in your code.

Time showed up it was much more reasonable way. It works, dude! If you’re stuck with conversion – please, take this recipe.

Split all by modules

Ok, now you have it all AWS items state in one .tf file. But only one! How can you easily manage configuration when your state have a dozens of thousands code? What about distribute your resources into the special single files?

Here is the concept of Terraform modules. Create your own ideal planet with parts like EC2 and VPC and countries like AS groups and AWS subnets. Keep your resources in different files, it’s going to be much easier for maintenance.

Amazon blog has such clear post about Terraform modules. So, reading that, you can easily split your state to few modules and build the Additionally, I just want to share my piece of code from main file.

module "vpc_network" {
  source = "./vpc_network"

module "ec2_instances" {
  source            = "./ec2_instances"
  production_subnet = "${module.vpc_network.production_subnet}"
  staging_subnet    = "${module.vpc_network.staging_subnet}"
  qa_subnet         = "${module.vpc_network.qa_subnet}"
  application_sg    = "${module.vpc_network.application_sg}"
  admin_sg          = "${module.vpc_network.admin_sg}"
  infrastructure_sg = "${module.vpc_network.infrastructure_sg}"
  payment_sg        = "${module.vpc_network.payment_sg}"

module "iam_policy" {
  source = "./iam_policy"

module "rds_databases" {
  source = "./rds_databases"
  production_subnet     = "${module.vpc_network.production_subnet}"
  staging_subnet        = "${module.vpc_network.staging_subnet}"
  qa_subnet             = "${module.vpc_network.qa_subnet}"
  pgsql_sg              = "${module.vpc_network.pgsql_sg}"

Tfstate per each environment

Usually service have few environments besides production. For my case, this is QA and Staging. Keeping environments isolated tfstate will save your life from Terraform bugs. You can change the infrastructure without any thought about consequences. Won’t go deeper, my cosplay subject already described this way.

QA & Staging on demand

For small projects with long release cycle you don’t need to keep testing environments round the clock. Indeed, why do you want the staging EC2 instances and balancers when you’re sleeping at 3am? And, if you’ll really want to, why just don’t launch it on demand?

Here is the Terraform success. With isolated .tfstate per each environment ou can easily create all you use with 2 commands; terraform plan and terraform apply. The same easy way to remove all resources after work with terraform destroy. Calculate how much money you will save running your environments only when you need it.

In case of emergency this solution looks great. When someone wants to test something at the weekend – you won’t deny him or toil on it. You just run the terraform and testing stand gets up!

What Terraform still can’t

Note: before reading this part, look at the date of post publication. Generally I wrote this post in August 2016. So, I’ll be happy if some problems were solved.

  1. Not all AWS services included. I know, they shouldn’t be. But to become enterprise, Hashicorp developers have to support for stuff like Elastic Transcoder. Getting started with WAF, I was really disappointed knowing that Terraform doesn’t support state of WAF rules and ACL. So, got a manual configuration, which was easy but not repeatable by automation.
    Now it’s a little bit tricky to configure one part of infrastructure in Terraform and another one in Management Console.
  2. Built-in conversion. That’s great if you started your project from scratch. But what if you came into the team who didn’t care about IAAS at all? Here might be at least simple engine which looks at resources and writes info in .tf file. Terraforming utility partly does it, but there is still a lot of drudgery described above.

So, play with Terraform and love Golang!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s