Hi there.
I have another short post about a toolset I put together during the last days that I wanted to share with the public.
As you know, we host almost all our projects in the Amazon Web Services. Almost every instance we use there is based on a so called „EBS Volume“, what is how Amazon calls is block storage. It is used as root volume and you can attach and mount more EBS volumes as storage to your instances.
Amazon supports the creating of snapshots of complete EBS volumes which is a great and simple way to have a kind of a full backup of your instance. The „correct“ way to take consistent snapshots is to power down the instance and then take the snapshot. Negative aspect: This means downtime. So conveniently Amazon gives you the opportunity to take snapshots of a running system, always under the the premise that you are aware that these „live snapshots“ may not be consistent.
In my point of view the advantages of live snapshots outweigh the disadvantages. In every case you should have a working backup strategy for you assets, data and so on and EBS snapshots are just another layer of disaster recovery.
When I started with EBS snapshotting I went the „cronjob-way“. For that I created a new AWS user with the appropriate permissions and use the generated credentials to run a script on the target instance with cron.
As this works in general, it is a repetitive, manual task. So the next step was to automate this procedure.
While playing with Terraform and AWS Lambda I found the following solution which is documented in my EBS_BCKUP Github repo.
What I have done is to automate the creation of an AWS Role with the needed permissions to take and delete snapshots and create a Lambda function that can make use of this role.
The function is triggered by a „cronlike“ CloudWatch event you can define, then checks all running instances in all AWS regions you set in the Terraform variable file for an EC2 Tag („Backup“ is the default value) and takes a snapshot of all connected EBS volumes.
So when you want to add another instance to this snapshot system it is not longer necessary to log in, install a script and create a cronjob. All you have to do is to tag the Instance and on the next run it will be found by the Lambda which creates the snapshot.
In addition you can define the retention time of you snapshots. As S3 storage is not for free, it makes sense to delete older snapshots.
Every snapshot that is created by the Lambda is tagged with the information to which instance it belongs and on which date it will be deleted by the next Lambda run.
So if you want to try it, please check out the Readme on Github.
I think it is quite straightforward: with Terraform installed, clone the repo, set your AWS credentials (you can define them in the Terraform variables, I use my assume-role helper script to set them as environment variables) and other necessary variables in the terraform.tfvars file. Then run a „terraform plan“ to see what resources will be created and if all variables are set correctly. If everything looks fine, you can create the infrastructure with „terraform apply“.
If you found this tool useful or see something I did wrong, just leave me a comment or create an issue on github.
So long!
The idea is great but the problem with this module is that once applied, on each plan/apply it will create the resources, which adds noise – once applied we want the state to definitive