I`d compare backups with meditation on human body. You don’t actually see the benefits, but in long term backups keep you healthy. Because they verify your infrastructure have reserve sources. Meditation helps to overcome the hardest moments of your life throwing out dirty thoughts. The clear backup similarly helps engineer to go through some kind of disasters. Even disasters we couldn’t guess before it was happened.
“Infrastructure as a Code” style
Skipping special exceptions, physical configurations backup is useless. User media content reservation is necessary, but…what about configs? Indeed, why do you have to copy many kernels and filesystems when you can easily get up this one afresh?
That’s also idea of loosely coupled system based on microservices. One machine serves only one component and does it great. From backup point it’s going to be much easier to keep the server state in code. So, the best restore solution – launch new instance and deploy the configuration from CM system. I picked Ansible, but it’s up to you what’s better, there is no standard.
Amazon services backup touches the same feeling. I busied my brains about how to get up what I have from scratch. The solution was Terraform. Theoretically I can not only bring my Amazon state back, but switch the cloud vendor with much less pain. But it’s a theory, not likely practice. Anyway Terraform is the best solution for AWS services backup. Here I thoroughly described what Terraform is.
AWS engineers know what is the point of backup and suggest many useful options. Trying to not repeat AWS documentation simply call 3 actions:
- AMI images. I mentioned above physical machine backup is useless. But not all things seems to be a good idea for automation. Like setting up the monitoring server. How many times did you make this in your life? A couple? Does it necessary to automate what you do such rarely? For this purpose AMI images saves your time and help to get up complicated server tweaks.
- S3/Glacier. Glacier is cool for data backup such a media content or documents. The schema in short: build new S3 bucket – set up cross-region replication with the main bucket – set lifecycle rule to archive all hold more than 3 days in Glacier. The cost will be definitely small.
- RDS database instances backup. I wish to serve different part for description of this.
Every DBMS has 2 backup ways: logical and physical. Don’t argue what’s better, do both!
Here is the PostgreSQL story. Logical backup is just a pg_dump scheduled on cron every day at 3am when users sleep. RDS doesn’t allow pg_dumpall, because it tries to copy pg_shadow and other sensitive information. So, even superuser can’t do this in RDS. So, one-by-one database, gentleman.
Physical backup is a hard-to-choose action. There are a wide choice of solutions. There might be Slony, or recently released software like BDR. But not in RDS. You set daily snapshot by default and become fine. Easy to recover from new DB instance.
Under off-site backup lies idea such a crash prevention after unthinkable circumstances. Keeping in Oregon backup of service located on Oregon? But what about earthquake in full Oregon? What about, at least, power outage?
To stop carry about crazy feelings it’s necessary to send outside your backup. It must be safe and secure place. I picked a Dropbox. There is one instance keeping daily backup of code repositories, configs, database dumps. The instance backup directory synced with Dropbox folder. Network transmission secured with HTTPS. All we need – install and daemonize Dropbox CLI agent. I described it before.
Alright, now you backed up all aspects of your infrastructure. Now get a free space and restore the artifacts from scratch. At first write the recovery plan in your documentation engine. Aspects have a big and messy dependencies chain. So, you must clarify what has to be recovered at first and what at last.
Later try to restore all aspects until the process reaches perfection. Record the time and make extral feedback with your team: developers, managers, stakeholders. Let them nice sleeping.
- Test all backups you make. Periodically, but test.
- Don’t try to backup the physical state of all your infrastructure. Think it over. The main recovery tool – human brain, remember this.
- Keep off-site backup. Always. And figure out how to safely deliver data to the destination point.