An excursion to AWS, Packer and predictable network interface names
Hi there.
Just wanted to share a small technical issue I came across recently and how I solved it.
As I described earlier, I provision our AWS machine images for our customers with Packer from HashiCorp and it has worked pretty well so far. Once a month I update the latest Ubuntu Xenial images available on AWS with the latest patches, provision the software via chef-solo and create a new Launch Configuration for the Autoscaling Groups.
To minimize cost, I usually use the cheap t2.micro instance as a „build-instance“ for packer compared to the live infrastructure that mostly runs on t2.medium or large instances, and I never ran into any problems so far.
But recently I reorganized the infrastructure of a customer to use a mix of on-demand t2 instances and a fleet of c4 spot instances (again, to minimize cost). As Packer is supposed to be pretty much hardware „agnostic“ I was surprised when the AMI generated by Packer on a t2.micro was unable to boot on a c4.large instance. It booted without problems on the t2.medium but the c4 type booted but I was unable to connect and after a while AWS terminated the instance, stating it was „not accessible“.
When I used a c4 instance to build the AMI, it was the other way round. Other c4 instances booted with this AMI but t2 instances never reached a functioning state.
After looking at the startup logs, which you luckily can access via the AWS console, I found out that the problem was network related. The instance and our software expected the eth0 network interface which wasn’t available on the c4 instance.
Then I found out that, for some reason, the naming convention for network interfaces changed. Now on Debian/systemd based systems they use the „Predictable Network Interface Names“ scheme. I won’t go into more details here, you can check out the link for the hows, whats and whys but that this, or to be precise, the inconsistent naming conventions on AWS, was ultimately the reason for my problem.
The problem was the use of different naming schemes on different instance types. New launched t2 instances with the latest Xenial image for some reason use the eth naming scheme while other instances (c4 at least) boot up the latest image with the new naming scheme. So Packer provisions the AMI with the expectation to find an eth-network interface but on a c4 it is called ens3. This causes „eth0 not found“ and „Failed to start Raise network interfaces“ errors.
I also raised this question on the AWS Forum where you can also see how I was able to solve the problem.
To get rid of this behavior I decided to disable the „predictable network interface name“ scheme on the Packer instance with two additional shell provisioning steps. My complete shell provision now looks like this:
{
„inline“: [
„sudo apt-get update“,
„sudo DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::=\“–force-confdef\“ -o Dpkg::Options::=\“–force-confold\“ dist-upgrade“,
„sudo sed -i ‚/^GRUB\\_CMDLINE\\_LINUX/s/\\\“$/\\ net\\.ifnames\\=0\\\“/‘ /etc/default/grub“,
„sudo update-grub“
],
„type“: „shell“
},
Using this commands the bootmanager of the AMI is configured to use the „old“ naming scheme and is therefore able to boot on all (I think) instance types AWS has to offer.
To be honest, it feels a little hacky but as long as there are inconsistent conventions used at AWS the other solution would be to build different AMIs for different instance types, which would double the work and complicate the management of the AMIs, Snapshots and Launch Configurations. So I for the moment it works without problems for me and the AMIs are hopefully finally instance-agnostic.