It didn’t make sense to update Amazon Web Services Networking webinar before the re:Invent conference – even though AWS introduced only a few networking features during the conference, at least one of them made a significant impact on the materials.
However, once the conference was over, I went over the to-do list that has been slowly accumulating for months and spent days updating over a dozen videos1. The major changes include:
One of the most exciting announcements from the last AWS re:Invent was the Elastic Network Adapter (ENA) Express functionality that uses the Scalable Reliable Datagram (SRD) protocol as the transport protocol for the overlay virtual networks. AWS claims ENA Express can push 25 Gbps over a single TCP flow and that SRD improves the tail latency (99.9 percentile) for high-throughput workloads by 85%.
Ignoring the “DPUs could change the network forever” blogosphere reactions (hint: they won’t), let’s see what could be happening behind the scenes and why SRD improves TCP throughput and tail latency.
On March 30th 2022, AWS announced automatic recovery of EC2 instances. Does that mean that AWS got feature-parity with VMware High Availability, or that VMware got it right from the very start? No and No.
Automatic Instance Recover Is Not High Availability
Reading the AWS documentation (as opposed to the feature announcement) quickly reveals a caveat or two. The automatic recovery is performed if an instance becomes impaired because of an underlying hardware failure or a problem that requires AWS involvement to repair.
Last week’s update session of the AWS Networking webinar covered two hours worth of new (or not-yet-covered) features, including:
- Transit Gateway Connect functionality (GRE tunnel+BGP between Transit Gateway and in-cloud SD-WAN appliances)
- AWS Private Link
- Intra-VPC static routes that you can use to send inter-subnet traffic to a BYOD security appliance
- IGMPv2 support
- Custom global accelerators
- Assigning whole IP prefixes to VM interfaces
The recordings have already been published, either as independent videos or integrated with the existing materials. Enjoy ;)
In his Where AWS IPv6 networking fails blog post, Jason Lavoie documents an intricate consequence of 2-pizza-teams not talking to one another: it’s really hard to get IPv6 in AWS VPC working with Transit Gateway and Direct Connect in large-scale multi-account environment due to the way IPv6 prefixes are propagated from VPCs to Direct Connect Gateway.
It’s one of those IPv6-only little details that you could never spot before stumbling on it in a real-life deployment… and to make it worse, it works well in IPv4 if you did proper address planning (which you can’t in IPv6).
TL&DR: Client clock skew could result in AWS authentication failure when running terraform apply
When I wanted to compare AWS and Azure orchestration speeds I encountered a crazy Terraform error message when running terraform apply:
module.network.aws_vpc.My_VPC: Creating... Error: Error creating VPC: AuthFailure: AWS was not able to validate the provided access credentials status code: 401, request id: ...
Obviously I did all the usual stuff before googling for a solution:
When I was complaining about the speed (or lack thereof) of Azure orchestration system, someone replied “I tried to do $somethingComplicated on AWS and it also took forever”
Following the “opinions are great, data is better” mantra (as opposed to “never let facts get in the way of a good story” supposedly practiced by some podcasters), I decided to do a short experiment: create a very similar environment with Azure and AWS.
I took simple Terraform deployment configuration for AWS and Azure. Both included a virtual network, two subnets, a route table, a packet filter, and a VM with public IP address. Here are the observed times:
Miha Markočič created sample automation scripts (mostly Terraform configuration files + AWS CLI commands where needed) deploying these features described in AWS Networking webinar:
- IP multicast deployment (video)
- Web Application Firewall deployment (video)
- Network Load Balancer deployment (video)
- Inter-region VPC Peering deployment (video)
To recreate them, clone the GitHub repository and follow the instructions.
Deciding to create AWS Networking and Azure Networking webinars wasn’t easy – after all, there’s so much content out there covering all aspects of public cloud services, and a plethora of certification trainings (including free training from AWS).
Having that in mind, it’s so nice to hear from people who found our AWS webinar useful ;)
Even though we are working with these technologies and have the certifications, there are always nuggets of information in these webinars that make it totally worthwhile. A good example in this series was the ingress routing feature updates in AWS.
It can be hard to filter through the noise from cloud providers to get to the new features that actually make a difference to what we are doing. This series does exactly that for me. Brilliant as always.
AWS Networking and Azure Networking webinars are available with Standard ipSpace.net Subscription. For even deeper dive into cloud networking check out our Networking in Public Cloud Deployments online course.
A while ago (eons before AWS introduced Gateway Load Balancer) I discussed the intricacies of AWS and Azure networking with a very smart engineer working for a security appliance vendor, and he said something along the lines of “it shows these things were designed by software developers – they have no idea how networks should work.”
In reality, at least some aspects of public cloud networking come closer to the original ideas of how IP and data-link layers should fit together than today’s flat earth theories, so he probably wanted to say “they make it so hard for me to insert my virtual appliance into their network.”
In last week’s update session we covered the new features AWS introduced since the creation of AWS Networking webinar in 2019:
- AWS Local Zones, Wavelengths, and Outposts
- VPC Sharing
- Bring Your Own Addresses
- IP Multicast support
- Managed Prefix Lists in security groups and route tables
- VPC Traffic Mirroring
- Web Application Firewall
- AWS Shield
- VPC Ingress Routing
- Inter-region VPC peering with Transit Gateways
The videos are already online; you need Standard or Expert ipSpace.net subscription to watch them.
One of my readers sent me this question (probably after stumbling upon a remark I made in the AWS Networking webinar):
You had mentioned that AWS is probably not using EVPN for their overlay control-plane because it doesn’t work for their scale. Can you elaborate please? I’m going through an EVPN PoC and curious to learn more.
It’s safe to assume AWS uses some sort of overlay virtual networking (like every other sane large-scale cloud provider). We don’t know any details; AWS never felt the need to use conferences as recruitment drives, and what little they told us at re:Invent described the system mostly from the customer perspective.
- Security groups to restrict access to web server and SSH bastion host;
- An IAM policy and associated user that has read-only access to EC2 and VPC resources (used for monitoring)
- An IAM policy that has full access to as single S3 bucket (used to modify static content hosted on S3)
- An IAM role for AWS CloudWatch logs
- Logging SSH events from the SSH bastion host into CloudWatch logs.
I got a question along these lines from a friend of mine:
Google recently announced a huge data center build in country to open new GCP regions. Does that mean I should invest into mastering GCP or should I focus on some other public cloud platform?
As always, the right answer is “it depends”, for example:
Regular readers of my blog probably remember the detailed explanations Erik Auerswald creates while solving hands-on exercises from our Networking in Public Cloud Deployments online course (previous ones: create a virtual network, deploy a web server).
This time he documented the process he went through to develop a Terraform configuration file that deploys full-blown AWS networking infrastructure (VPC, subnets, Internet gateway, route tables, security groups) and multiple servers include an SSH bastion host. You’ll also see what he found out when he used Elastic Network Interfaces (spoiler: routing on multi-interface hosts is tough).