TTL in Overlay Virtual Networks
After we get rid of the QoS FUD, the next question I usually get when discussing overlay networks is “how should these networks treat IP TTL?”
As (almost) always, the answer is “It depends.”
Layer-2 Virtual Networks
Overlay virtual networking solutions like VXLAN that implement layer-2 segments (effectively Ethernet-over-something) should not modify the VM-generated traffic. These solutions are emulating a transparent bridge and should NOT interact with the user traffic; all they can do is forward, flood or drop.
Obviously the transport TTL (TTL generated by hypervisor when encapsulating the VM traffic) shouldn’t reflect the VM-generated TTL. VM-generated TTL could be anything (VM could also generate non-IP traffic), while the transport TTL needs to be high enough to allow the packet to traverse the data center core.
Conclusions:
- Don’t touch the overlay (VM) TTL;
- Use whatever TTL makes sense in the transport network.
Layer-3 Virtual Networks
Solutions that implement layer-3 forwarding are usually emulating Ethernet segments (layer-2 segments) connected with routers. In some cases the whole virtual network acts as a single virtual router (VMware NSX Distributed Router, Hyper-V, NEC ProgrammableFlow …), in others the inter-subnet traffic flows through a gateway appliance or a VM (VMware NSX Services Router, default OpenStack networking …).
These solutions SHOULD decrement TTL like any other router (or layer-3 switch) would do. If they wish to stay as close to the emulated Ethernet behavior as possible, they SHOULD decrement TTL if and only if the packet crosses subnet boundaries (or you might get crazy problems with application software that sends packets with TTL = 1).
For example, Hyper-V Network Virtualization SHOULD NOT decrement TTL if the source and destination VM belong to the same subnet (even though the HNV module actually performs L3 lookup to figure out where to send the packet) but SHOULD decrement TTL if the destination VM belongs to another IPv4 or IPv6 subnet.
Like in the layer-2 case, the transport TTL has nothing in common with the VM-generated TTL – hypervisors should use whatever TTL they need to get the encapsulated traffic across the data center fabric.
Conclusions:
- Decrement TTL like a router would do;
- Don’t copy overlay TTL into transport TTL or vice versa;
- Use whatever TTL makes sense in the transport network.
But this is not how MPLS works
Really? Well, this is EXACTLY how L2VPNs (EoMPLS, VPLS, EVPN) work.
MPLS-based L3VPN (the “original” MPLS/VPN) is a totally different story: it’s not supposed to emulate a single virtual router, but a whole WAN. Copying customer TTL into provider TTL (and vice versa) is the most natural thing to do under those circumstances (unless the MPLS provider disables TTL propagation because they want to hide the internal network details).
More information
Watch the Cloud Computing Networking webinar if you need an overview of various virtual network technologies, Overlay Virtual Networking for an overview of what major vendors have to offer, VXLAN deep dive if you’re interested in VXLAN implementation details and VMware NSX Technical Deep DiveArchitecture if you want to know how NSX works.
* If you have a L3 forwarding loop, overlay TTL will eventual expire.
* If you have a L2 forwarding loop, you'll get the same fancy effects like in physical L2 networks (the only difference being that the looped packets will hose a few servers, not the whole network).
http://cloudierthanthou.wordpress.com/2013/04/30/the-sdn-behemoth-hiding-in-plain-sight/
Would you do one more AWS VPC test? Add a third VM in one of the subnets, ping between all three and dump ARP tables on all three VMs.
Thank you!
Ivan
Also, while you can ping your gateway (10.0.0.33->10.0.0.1), you cannot ping the gateway on the other subnet (10.0.0.33->10.0.1.1).
[ec2-user@ip-10-0-0-33 ~]$ sudo arp -n
Address HWtype HWaddress Flags Mask Iface
10.0.0.6 ether 02:c5:98:d1:b4:69 C eth0
10.0.0.16 ether 02:c5:98:d7:5c:43 C eth0
10.0.0.1 ether 02:c5:98:c0:00:02 C eth0
https://www.youtube.com/watch?v=3qln2u1Vr2E