Figuring Out AWS Networking

One of my friends reviewing the material of my AWS Networking webinar sent me this remark:

I'm always interested in hearing more about how AWS network works under the hood – it’s difficult to gain that knowledge.

As always, it’s almost impossible to find out the behind-the-scenes details, and whatever Amazon is telling you at their re:Invent conference should be taken with a truckload of salt… but it’s relatively easy to figure out a lot of things just by observing them and performing controlled experiments.

As in any (scientific) research:

  • Figure out the problem domain;
  • Observe what’s going on;
  • Form a question;
  • Form a hypothesis;
  • Create and conduct as simple an experiment as possible to validate or refute the hypothesis;
  • Analyze the data and draw a conclusion;
  • Lather, rinse, repeat ;))

I talked about this approach in Learning How to Use New Tools part of Getting Started module of Building Network Automation Solutions online course; here’s a simple AWS networking example.

Fact: A VM (EC2 instance) running in a Virtual Private Cloud (VPC) could have a public and a private (intra-VPC) IP address. This should trigger the curiosity of any decent networking engineer (“I wonder how that’s done”)

Observation: Start a Linux VM in a public subnet of a VPC and log into the VM. Execute ifconfig or ip addr to see all IP addresses configured on VM interfaces. You would notice the VM private IP address, but not the public one.

Question: That’s funny… I wonder where the public IP address is…

Hypothesis: The public IP address is present on the Internet gateway which does NAT.

Now stop reading and figure out an experiment that would validate this hypothesis.

Here’s what I did:

  • Create two VMs in a VPC subnet;
  • Ping between their private IP addresses – it works;
  • Ping between their public IP addresses – it works;
  • Remove the default route pointing to the Internet gateway;
  • Repeat the tests

Result: After removing the default route to the Internet gateway, the VMs cannot reach the public IP address of other VMs. Internet gateway is therefore somehow involved in this process.

Hypothesis: Internet gateway (whatever it happens to be and wherever it happens to be located within AWS) performs NAT between private and public VM addresses.

Question: I wonder what happens when a VM in a VPC pings the public I address of another VM in the same subnet.

Hypothesis: If the Internet gateway performs NAT between private and public IP addresses of a VM, then it should also NAT the source IP address of outgoing traffic. Pings received on the second VM should look like they’re coming from the public IP address of the first VM.

Experiment: Trivial; left as an exercise.

Got the idea? Now figure out how packet forwarding works and how you can influence it with routing tables configured on the VMs ;)

Want to know more?

I documented everything I discovered while experimenting with AWS in my new AWS Networking Deep Dive webinar. We covered regions, availability zones, VPCs, subnets, interfaces and addressing in the first live session.

The second live session – starting with sample deployment scenarios and then moving on to network security – is in just two days… and all you need to attend it is a current subscription.


  1. Hi Ivan,

    some of insights into AWS can be found in AWS Certified Advanced Networking Official Study Guide. Excerpt from this book: Tenant isolation is a core function of Amazon VPC. In order to understand which
    resources are part of a given VPC, Amazon VPC uses a mapping service. The mapping service
    abstracts your VPC from the underlying AWS infrastructure. For any given VPC, the
    mapping service maintains information about all of its resources, their VPC IP addresses,
    and the IP addresses of the underlying physical server on which the resource is running. It
    is the definitive source of topology information for each VPC.
    When an Amazon EC2 instance, say Instance A, in your VPC initiates communication
    with another Amazon EC2 instance, say Instance B, over IPv4, Instance A will broadcast
    an Address Resolution Protocol (ARP) packet to obtain the Instance B’s Media Access
    Control (MAC) address. The ARP packet leaving Instance A is intercepted by the server
    Hypervisor. The Hypervisor queries the mapping service to identify whether Instance B
    exists in the VPC and, if so, obtains its MAC address. The Hypervisor returns a synthetic
    ARP response to Instance A containing Instance B’s MAC address.
    Instance A is now ready to send an IP packet to Instance B. The IP packet has Instance
    A’s source IP and Instance B’s destination IP. The IP packet is encapsulated in an Ethernet
    header with Instance A’s MAC as the source address and Instance B’s MAC as the destination
    address. The Ethernet packet is then transmitted from Instance A’s network interface.
    As Instance A emits the packet, it is intercepted by the server Hypervisor. The Hypervisor
    queries the mapping service to learn the IPv4 address of the physical server on which Instance
    B is running. Once the mapping service provides this data, the packet emitted by Instance A
    is encapsulated in a VPC header that identifies this specific VPC and then encapsulated again
    in an IP packet with a source IP address of Instance A’s physical server and a destination IPv4
    address of Instance B’s physical server. The packet is then placed on to the AWS network.
    When the packet arrives at Instance B’s physical server, the outer IPv4 header and VPC
    header are inspected. The instance Hypervisor queries the mapping service to confirm that
    Instance A exists on the specific source physical server and in the specific VPC identified
    in the received packet. When the mapping service confirms that the mapping is correct, the
    Hypervisor strips off the outer encapsulation and delivers the packet that Instance A emitted
    to the Instance B network interface.
    The details of packet exchange in Amazon VPC should provide you clarity on why, for
    example, Amazon VPC does not support broadcast and multicast. These same reasons
    explain why packet sniffing does not work. As you reason about Amazon VPC operation
    and functionality, consider this example.
    1. Thank you for sharing that. It's all nice and dandy but fails to answer these questions:

      A) why doesn't AWS reply with the same MAC address for every IP?
      B) what does the source hypervisor use to figure out the packet is sent to instance B?
      C) what happens if A has a static route for X pointing to B?

      I'd guess that these are the details that are totally irrelevant to an application developer but might be crucial for a networking engineer.
    2. Check out:

      A) AWS routes based on MAC address inside a VPC. For traffic coming in there's a custom SDN that maps IP address-VPC combinations to where they are in the network. The day in the life of a billion packets presentation by Eric Brandwine covers this in pretty good detail. More recently by Colm in his networking sessions.
      B) There are cached mappings sent to each hypervisor (or Nitro card) that have those mappings so they don't need to create broadcast traffic or reactive packet lookups on new flows.
      C) There's not a lot about AWS networking in detail written down, but you're pretty safe to assume there's not static routes in the physical hosts :)

      You could argue crucial or not. ARP works. Layer 2 doesn't. No ARP spoofing or rogue DHCP servers. No worries about subnet size. The only time this really comes up is when you try to run some third party router on an instance and it expects all networks it ever sees to support the same layer 2 features everywhere (GARP, multicast, broadcast, etc.). Even then, almost all of those use cases are better suited to use Transit VPC or now Transit Gateway where all the routing functionality happens through BGP and VPN rather than at layer 2 constructs.

  2. The most interesting fact I learned earlier this year was having EC2-based SSH proxies in public subnets is not strictly necessary to reach my private subnet resources. Configuring a CLB (in a public subnet) to do the same will work just fine. In this way, all my EC2s (and othe private resources) could live in private subnets as part of my internal routing domain, making maintenance and management much cleaner and simpler. I tested this in the past but failed, then later discovered my testing methodology was incorrect, and I lost a year of suboptimal architecture because of it. I hope that experience may help someone else in the future.
    1. Thank you. Was it CLB or ELB?... and were you accessing your instances over the public Internet?
    2. Here:
      He's using static PAT. What an evolution in networking!
    3. Thanks a million. I decided load balancing is out of scope of the webinar when I started creating the content. Obviously I was wrong... will add it to the content.
  3. There have been some re:Invent presentations that cover AWS networking internals; sadly I don't think that information was ever released in article form.
  4. A) Why should it? You would need to rewrite it then.
    B) For me that is actually answered in excerpt
    C) you cannot configure more specific route than your VPC local prefix (you will get an error)
    1. Let's focus on (B) first. The important details is "does the hypervisor look up destination IP or destination MAC?"... and that is not answered in the excerpt.

      Now for (A): if the hypervisor uses destination IP address, why do they bother with destination MACs? Wouldn't it be the same if they'd just use the same MAC address for every IP address?

      As usual, things are a bit more complex than they seem...
  5. I am just starting to deep into AWS network constructs, but for the moment I can see this "MPLS-analogy" for the TGW, although for sure this is very simplistic and it is really more complex:

    • The TGW works as an MPLS backbone - underlay.
    • The attachment of the VPC to the TGW is like "connecting the physical cable into a PE port".
    • Every RT in the TGW works as a VRF (BUT in a different PE with no import/export policies)
    • The association of the attachment into an RT is like linking the subnet into a VRF.
    • The propagation of the routes from attachments into an RT is like creating a RT-import policy

    Most probably I am making a mistake thinking all the time to link the "classical behaviour" and looking for a real-overlay-scenario, but for the moment and until I can be fully immersed, I am at this point.

    Thanks for all your posts, Ivan!

Add comment