Figuring Out AWS Networking
One of my friends reviewing the material of my AWS Networking webinar sent me this remark:
I'm always interested in hearing more about how AWS network works under the hood – it’s difficult to gain that knowledge.
As always, it’s almost impossible to find out the behind-the-scenes details, and whatever Amazon is telling you at their re:Invent conference should be taken with a truckload of salt… but it’s relatively easy to figure out a lot of things just by observing them and performing controlled experiments.
As in any (scientific) research:
- Figure out the problem domain;
- Observe what’s going on;
- Form a question;
- Form a hypothesis;
- Create and conduct as simple an experiment as possible to validate or refute the hypothesis;
- Analyze the data and draw a conclusion;
- Lather, rinse, repeat ;))
I talked about this approach in Learning How to Use New Tools part of Getting Started module of Building Network Automation Solutions online course; here’s a simple AWS networking example.
Fact: A VM (EC2 instance) running in a Virtual Private Cloud (VPC) could have a public and a private (intra-VPC) IP address. This should trigger the curiosity of any decent networking engineer (“I wonder how that’s done”)
Observation: Start a Linux VM in a public subnet of a VPC and log into the VM. Execute ifconfig or ip addr to see all IP addresses configured on VM interfaces. You would notice the VM private IP address, but not the public one.
Question: That’s funny… I wonder where the public IP address is…
Hypothesis: The public IP address is present on the Internet gateway which does NAT.
Now stop reading and figure out an experiment that would validate this hypothesis.
Here’s what I did:
- Create two VMs in a VPC subnet;
- Ping between their private IP addresses – it works;
- Ping between their public IP addresses – it works;
- Remove the default route pointing to the Internet gateway;
- Repeat the tests
Result: After removing the default route to the Internet gateway, the VMs cannot reach the public IP address of other VMs. Internet gateway is therefore somehow involved in this process.
Hypothesis: Internet gateway (whatever it happens to be and wherever it happens to be located within AWS) performs NAT between private and public VM addresses.
Question: I wonder what happens when a VM in a VPC pings the public I address of another VM in the same subnet.
Hypothesis: If the Internet gateway performs NAT between private and public IP addresses of a VM, then it should also NAT the source IP address of outgoing traffic. Pings received on the second VM should look like they’re coming from the public IP address of the first VM.
Experiment: Trivial; left as an exercise.
Got the idea? Now figure out how packet forwarding works and how you can influence it with routing tables configured on the VMs ;)
Want to know more?
I documented everything I discovered while experimenting with AWS in my new AWS Networking Deep Dive webinar. We covered regions, availability zones, VPCs, subnets, interfaces and addressing in the first live session.
The second live session – starting with sample deployment scenarios and then moving on to network security – is in just two days… and all you need to attend it is a current ipSpace.net subscription.
some of insights into AWS can be found in AWS Certified Advanced Networking Official Study Guide. Excerpt from this book: Tenant isolation is a core function of Amazon VPC. In order to understand which
resources are part of a given VPC, Amazon VPC uses a mapping service. The mapping service
abstracts your VPC from the underlying AWS infrastructure. For any given VPC, the
mapping service maintains information about all of its resources, their VPC IP addresses,
and the IP addresses of the underlying physical server on which the resource is running. It
is the definitive source of topology information for each VPC.
When an Amazon EC2 instance, say Instance A, in your VPC initiates communication
with another Amazon EC2 instance, say Instance B, over IPv4, Instance A will broadcast
an Address Resolution Protocol (ARP) packet to obtain the Instance B’s Media Access
Control (MAC) address. The ARP packet leaving Instance A is intercepted by the server
Hypervisor. The Hypervisor queries the mapping service to identify whether Instance B
exists in the VPC and, if so, obtains its MAC address. The Hypervisor returns a synthetic
ARP response to Instance A containing Instance B’s MAC address.
Instance A is now ready to send an IP packet to Instance B. The IP packet has Instance
A’s source IP and Instance B’s destination IP. The IP packet is encapsulated in an Ethernet
header with Instance A’s MAC as the source address and Instance B’s MAC as the destination
address. The Ethernet packet is then transmitted from Instance A’s network interface.
As Instance A emits the packet, it is intercepted by the server Hypervisor. The Hypervisor
queries the mapping service to learn the IPv4 address of the physical server on which Instance
B is running. Once the mapping service provides this data, the packet emitted by Instance A
is encapsulated in a VPC header that identifies this specific VPC and then encapsulated again
in an IP packet with a source IP address of Instance A’s physical server and a destination IPv4
address of Instance B’s physical server. The packet is then placed on to the AWS network.
When the packet arrives at Instance B’s physical server, the outer IPv4 header and VPC
header are inspected. The instance Hypervisor queries the mapping service to confirm that
Instance A exists on the specific source physical server and in the specific VPC identified
in the received packet. When the mapping service confirms that the mapping is correct, the
Hypervisor strips off the outer encapsulation and delivers the packet that Instance A emitted
to the Instance B network interface.
The details of packet exchange in Amazon VPC should provide you clarity on why, for
example, Amazon VPC does not support broadcast and multicast. These same reasons
explain why packet sniffing does not work. As you reason about Amazon VPC operation
and functionality, consider this example.
A) why doesn't AWS reply with the same MAC address for every IP?
B) what does the source hypervisor use to figure out the packet is sent to instance B?
C) what happens if A has a static route for X pointing to B?
I'd guess that these are the details that are totally irrelevant to an application developer but might be crucial for a networking engineer.
and
http://packetpushers.net/podcast/podcasts/show-387-aws-networking-view-inside/
A) AWS routes based on MAC address inside a VPC. For traffic coming in there's a custom SDN that maps IP address-VPC combinations to where they are in the network. The day in the life of a billion packets presentation by Eric Brandwine covers this in pretty good detail. More recently by Colm in his networking sessions.
B) There are cached mappings sent to each hypervisor (or Nitro card) that have those mappings so they don't need to create broadcast traffic or reactive packet lookups on new flows.
C) There's not a lot about AWS networking in detail written down, but you're pretty safe to assume there's not static routes in the physical hosts :)
You could argue crucial or not. ARP works. Layer 2 doesn't. No ARP spoofing or rogue DHCP servers. No worries about subnet size. The only time this really comes up is when you try to run some third party router on an instance and it expects all networks it ever sees to support the same layer 2 features everywhere (GARP, multicast, broadcast, etc.). Even then, almost all of those use cases are better suited to use Transit VPC or now Transit Gateway where all the routing functionality happens through BGP and VPN rather than at layer 2 constructs.
-nick
He's using static PAT. What an evolution in networking!
https://www.youtube.com/watch?v=Zd5hsL-JNY4
https://www.youtube.com/watch?v=St3SE4LWhKo
https://www.youtube.com/watch?v=8gc2DgBqo9U
B) For me that is actually answered in excerpt
C) you cannot configure more specific route than your VPC local prefix (you will get an error)
Now for (A): if the hypervisor uses destination IP address, why do they bother with destination MACs? Wouldn't it be the same if they'd just use the same MAC address for every IP address?
As usual, things are a bit more complex than they seem...
I am just starting to deep into AWS network constructs, but for the moment I can see this "MPLS-analogy" for the TGW, although for sure this is very simplistic and it is really more complex:
Most probably I am making a mistake thinking all the time to link the "classical behaviour" and looking for a real-overlay-scenario, but for the moment and until I can be fully immersed, I am at this point.
Thanks for all your posts, Ivan!