Going on with my “operations revelations”, I’d like to describe my cloud experience. Project where I’m working now has all services and hardware on the Amazon cloud. It would be strange if it wouldn’t, because there is a definitely big trend to host applications in AWS.
I’ve never seen and thought how Amazon work. So, there must be some fun for me.
Why is it so important?
It matters every datacenter format. If you don’t know, what’s going on with your hardware – you take a wrong way. Such wrong where almost no one possibility to turn back.
Working on cloud you have to clarify for what you pay, what you use for this price and how could you scale what you have. There are hundreds of tools, as built-in as well as third-party. It’s a great deal to understand what is the use of each one and does it make sense to leverage this solution.
- AWS itself. Amazon provides a lot of prompts as in enormous documentation as in its own services. Also resources might be tagged. It helps to split your infrastructure on few neighborhoods like staging, QA etc.
- Terraform. At first I thought it over how to backup my AWS resources state. The first advice I got was a CloudFormation. It seems such a beautiful to roll up your items from pictures. But it’s useless to back up existed infrastructure. CloudFormation creates strange instances with strange names. But you try to just save your state template for the future. And here Terraform goes! Still painful to collect existed resources but at last you get what you want. This is the clear state you can easily change or just take a look.
- Ansible. Generally I use Ansible for EC2 state orchestration. To go over EC2 to other services I use only Terraform.
- Netflix Ice. Perfect costs tracking solution provided by Netflix development team. It could be used as a AWS billing dashboard. The difference in extra graphs and detailed analysis. I leverage Ice to track my daily costs, simply compare it with the past and split costs by neighborhood. The latter is definitely a great thing. With “application groups” you can understand how did you charged in for every subfield. You can also do the same in AWS Cost Tracking with tags, but actually you cannot tag all resources. I faced with situation where Cloudfront and Elastic Transcoder tagging is impossible. So, here Ice is.
- Python. There is just a lot of things you may automate by code. Interacting with AWS API from Boto3, I can daily move the RDS snapshots to another region for backup. Also Amazon capabilities monitoring is reachable with code.
1. Sort out resources disorder. When I dived into job, there was a lot of stuff seemed unclear. At first I clean up all useless tools and tag the rest existed ones. For administration this operation is trivial but definitely important. Skip tagging, you might be confused, when your infrastructure will become too great to remember each component.
2. Detalize financial charges. The next thing – clarify for what we pay. Here are tags + Netflix Ice helped a lot. We started to understand what capabilities worth money and what isn’t.
3. Track charges for each environment. Your infrastructure will become more clear when you separate it. Under environments I mean production, staging, QA, local development and infrastructure means. Netflix Ice provisioned by tags helped here too. Now I can look at this picture and figure out where to reduce costs or where is it possible to add more.
Also it’s not only bill separation issue. I’ve splitted Terraform state onto all environments I have. This kind of maintenance saves your memory sorting out your items on different baskets.
4. Optimize total bill. With monitoring and capacity planning I realized we don’t wish too much resources as we have.
I just removed stuff like Cloudwatch dashboards and downgraded almost all EC2 instances. Generally t2.nano machine will be enough for small purposes. Initial Linux image takes too little memory to think about consumption problem.
So, set your instances the class which will be appropriate to instance memory consumption.
5. Separate and minimize console user privileges. One of the most important things. No one team member should have more privileges than necessary. The same thing matters service-to-service access. This is the gold security rule. Although it takes some time to understand IAM conception and figure out what’s necessary for each group of project team.
Honestly, I burned my hands on this task a little bit. I didn’t think that changing IAM roles you cannot change them in some existed EC2 resources. Thus I lost CodeDeploy application access to EC2 instances removing EC2 role. During the troubleshooting I drawed the schema and inspected that AWS resources linked such densely. So, watch out: screwing up one resource, you might screw up some more.
6. Change environment namespace. This happened historically we had production, development and testing. But what is the testing and what is the development generally. Watching how your application is working on local developer environment is a testing too. Development is a huge termin. It’s not only pre-look before pushing release in production.
Because of these facts we reconfigured our namespace to “staging” and “QA”. This step required some changes in source code plus retag existed resources.
7. Move staging and QA to ‘on-demand’ state. Thank you, Terraform! Now I may roll up and destroy testing stands in one command.
Our application is not too large to test anything every minute. So, I came to the question: why do we want a staging+QA instances at, say, 5 hours in the morning? But we pay to host environments at that time! Now when we need any neighborhood – we launch it! The new EC2 instance provisioned by AMI image, running up. Also dependent means like DNS record and load balancer growing up too. And at last, automated deploy of tested revision. The total schema seems like oasis when you tried it at first.
8. Start to leverage reserved EC2 instances. Actually this step is not implemented now, but it’s necessary to optimize overall payment. If you got to host your application on t2.medium at least one year – why wouldn’t you buy the instance in advance? This solution saves such a big amount of financial sources. Also take a look at AWS Marketplace: there might be a cheaper cloud computing solution.
1. Leverage Amazon ready-to-deploy services seems such easy and reasonable solution. But keep in mind: the more you’ll need during your product roadmap – the much more money you’ll pay to Amazon. Try to choose the cheapest solution. Some part of Amazon solutions could be manually deployed into EC2 containers.
For example, why do you want Elasticache with monthly 70$ per cluster if you use it such a little. For that case you can just easily setup Redis or Memcached on low-capacity EC2 container.
2. If you get started with AWS – please, use Terraform for your first steps. Infrastructure-as-a-code model allows you to carelessly scale up and move your infrastructure. Say, moving from Oregon to Ireland will take an hour or even less. Also you’ll have a clear configuration and time saved on clicking thousands console buttons.
1. Cloud administration is much easier than local datacenter work. You don’t have to care about hardware stuff like what to buy, how could it be maintained etc.. And you will throw out network cares and things like “Why does this crappy router get stuck?”. Cloud seems a little bit far than reality, but it’s working so much painless anyway.
2. Good cloud provider is an operations engineer itself. Actually small teams with small project doesn’t need Ops member at all. Cloud allows developer to automate routine tasks by one mouse click. You may not be DBA to perfectly maintain database with RDS. You don’t have to penetrate into CI to successfully deploy your revision by CodeDeploy. Operations use-case count is such a big.
But! The more your application will grow up – the more you lack experience to keep your cloud in clear state. And you take Ops to deal with this job!