Packet Forwarding in Amazon VPC

Packet forwarding behavior of VMware NSX and Hyper-V Network Virtualization is well documented; no such documentation exists for Amazon VPC. However, even though Amazon uses a proprietary solution (heavily modified Xen hypervisor with homemade virtual switch), it’s pretty easy to figure out the basics from the observed network behavior and extensive user documentation.

Chiradeep Vittal ran a number of tests between virtual machines in an Amazon VPC network and shared the results in a blog post and extensive comments on one of my posts. Here’s a short summary:

  • Virtual switches in Amazon VPC perform layer-3-only unicast IPv4 forwarding (similar to recent Hyper-V Network Virtualization behavior). All non-IPv4 traffic and multicast/broadcast IPv4 traffic is dropped.
  • Layer-3 forwarding in the hypervisor virtual switch does not decrement TTL – it’s like all virtual machines reside in the same subnet;
  • Hypervisor proxies all ARP requests and replies with the expected MAC addresses of target VMs or first-hop gateway (early implementations of Amazon VPC used the same destination MAC address in all ARP replies);
  • Virtual switch implements limited router-like functionality. For example, the default gateway IP address replies to pings, but a VM cannot ping the default gateway of another subnet.

Seems like a run-of-the-mill virtual networking implementation, but wait – that’s not all. The beauty of Amazon VPC forwarding model is the multi-VRF approach: you can create multiple routing tables in your VPC and assign one of them to each subnet.

You could, for example, use the default route toward the Internet for web server subnet, default route toward your data center for database server subnet, and no default routing (local connectivity only) for your application server subnet. Pretty cool stuff if you’re an MPLS/VPN geek used to schizophrenic routing tables, and quite a tough nut to crack for people who want to migrate their existing layer-2 networks into the cloud. Massimo Re Ferre made a perfect summary: everyone else is virtualizing the network, Amazon VPC is abstracting it.

More information

I’m describing virtual networking models of Cisco’s and VMware’s VXLAN, VMware NSX, vCloud Director, Hyper-V Network Virtualization, Juniper Contrail and Amazon VPC in Cloud Computing Networking webinar.

Overlay Virtual Networking webinar goes deeper into the commercially available architectures from numerous vendors, including Cisco, VMware, Microsoft, IBM, and Midokura.

10 comments:

  1. Does Amazon VPC transport VM's ethernet headers for IP packets?

    ReplyDelete
  2. There was a really good video from the re:invent sessions earlier this year that goes through VPC forwarding:
    http://www.youtube.com/watch?v=Zd5hsL-JNY4

    ReplyDelete
    Replies
    1. Thanks for the pointer. It helped. I wonder why Amazon decided to respond with the destination VM MAC (even for inter-subnet forwarding).

      Delete
    2. I don't really think they will reply with the target VM's MAC, which is strange.

      Delete
    3. You think or you have recent printouts proving otherwise?

      Delete
    4. I have no way to prove it. But I think it's more reasonable to reply with the router's mac, in which way, Cisco ACI does.

      Delete
    5. "I have no way to prove it" - how about scheduling a few Linux VMs in Amazon EC2 and look at their ARP cache? Admittedly I'm too lazy to do it, but don't claim there's no way to do it - it will only cost you a few hours and a few dollars.

      Delete
  3. Ivan,

    Very fun post. Stick it to the man.

    I just wanted to thank you for the insane amount of posts you've put out this year. Not one of them was a waste of time. It was hard for me to keep up and I hope I did not miss too many. I do not know how you do it!

    Will

    ReplyDelete
  4. Ivan, do you have any guesses about how Amazon's SR-IOV setup works? AFAIK the Intel NIC does not support overlays.

    ReplyDelete
    Replies
    1. No idea. I don't think a single NIC supports overlay encapsulations today (but of course things might have changed in the last few months, and I'm positive at least a few vendors announced support of VXLAN and/or NVGRE in some unspecified future).

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.