VXLAN termination on physical devices

Every time I’m discussing the VXLAN technology with a fellow networking engineer, I inevitably get the question “how will I connect this to the outside world?” Let’s assume you want to build pretty typical 3-tier application architecture (next diagram) using VXLAN-based virtual subnets and you already have firewalls and load balancers – can you use them? Today the answer is NO.

The only product supporting VXLAN Tunnel End Point (VTEP) in the near future is the Nexus 1000V virtual switch; the only devices you can connect to a VXLAN segment are thus Ethernet interface cards in virtual machines. If you want to use a router, firewall or load balancer (sometimes lovingly called application delivery controller) between two VXLAN segments or between a VXLAN segment and the outside world (for example, a VLAN), you have to use a VM version of the layer-3 device. That’s not necessarily a good idea; virtual networking appliances have numerous performance drawbacks and consume way more CPU cycles than needed ... but if you’re a cloud provider billing your customers by VM instances or CPU cycles, you might not care too much.

The virtual networking appliances also introduce extra hops and unpredictable traffic flows into your network, as they can freely move around the data center at the whim of workload balancers like VMware’s DRS. A clean network design (left) is thus quickly morphed into a total spaghetti mess (right):

Cisco doesn’t have any L3 VM-based product, and the only thing you can get from VMware is vShield Edge – a dumbed down Linux with a fancy GUI. If you’re absolutely keen on deploying VXLAN, that shouldn’t stop you; there are numerous VM-based products, including BIG-IP load balancer from F5 and Vyatta’s routers. Worst case, you can turn a standard Linux VM into a usable router, firewall or NAT device by removing less functionality from it than VMware did. Not that I would necessarily like doing that, but it’s one of the few options we have at the moment.

Next steps?

Someone will have to implement VXLAN on physical devices sooner or later; running networking functions in VMs is simply too slow and too expensive. While I don’t have any firm information (not even roadmaps), do keep in mind Ken Duda’s enthusiasm during the VXLAN Packet Pushers podcast (and remember that both Arista and Broadcom appear in the author list of VXLAN and NVGRE drafts).

Furthermore, VXLAN encapsulation format is actually a subset of OTV encapsulation, as Omar Sultan pointed out in his VXLAN Deep Dive blog post, which means that Cisco already has the hardware necessary to terminate VXLAN segments in Nexus 7000.

How could you do it?

Layer-3 termination of VXLAN segments is actually pretty easy (from the architectural and control plane perspective):

  • VMs attached to a VXLAN segment are configured with the default gateway’s IP address (intra-VXLAN subnet logical IP address of the physical termination device);
  • A VM sending an IP packet to an off-subnet destination has to send it to the default gateway’s IP address and performs an ARP request;
  • One or more layer-3 VXLAN termination devices respond to the ARP request sent in the VXLAN encapsulation and the Nexus 1000V switch in the hypervisor running the VM remembers RouterVXLANMAC-to-RouterPhysicalIP address mapping;
  • When the VM sends an IP packet to the default gateway’s MAC address, the Nexus 1000V switch forwards the IP-in-MAC frame to the nearest RouterPhysicalIP address.

No broadcast or flooding is involved in the layer-3 termination, so you could easily use the same physical IP address and the same VXLAN MAC address on multiple routers (anycast) and achieve instant redundancy without first hop redundancy protocols like HSRP or VRRP.

Layer-2 extension of VXLAN segments into VLANs (that you might need to connect VXLAN-based hosts to an external firewall) is a bit tougher. As you’re bridging between VXLAN and an 802.1Q VLAN, you have to ensure that you don’t create a forwarding loop.

You could configure the VXLAN layer-2 extension (bridging) on multiple physical switches and run STP over VXLAN ... but I hope we’ll never see that implemented. It would be way better to use IP functionality to select the VXLAN-to-VLAN forwarder. You could, for example, run VRRP between redundant VXLAN-to-VLAN bridges and use VRRP IP address as the VXLAN physical IP address of the bridge (all off-VXLAN MAC addresses would appear as being reachable via that IP address to other VTEPs). The VRRP functionality would also control the VXLAN-to-VLAN forwarding – only the active VRRP gateway would perform the L2 forwarding. You could still use a minimal subset of STP to prevent forwarding loops, but I wouldn’t use it as the main convergence mechanism.

Summary

VXLAN is a great concept that gives you clean separation between virtual networks and physical IP-based transport infrastructure, but we need VXLAN termination in physical devices (switches, potentially also firewalls and load balancers) before we can start considering large-scale deployments. Till then, it will remain an interesting proof-of-concept tool or a niche product used by infrastructure cloud providers.

More information

The concepts and challenges of virtualized networking are described in the Introduction to Virtualized Networking webinar. For more details, check out my Data Center 3.0 for Networking Engineers (recording) and VMware Networking Deep Dive (recording) webinars. Both of them are also available as part of the Data Center Trilogy and you get access to all three webinars (and numerous others) as part of the yearly subscription.

11 comments:

  1. Ivan, just thinking out load here but I think as network guys we're going to have to get past the "extra hops and unpredictable traffic flows" hang up. The paths and hops look ugly if you look at it from the perspective of the physical network, but it's all perfectly normal from the perspective of the virtual network. The physical network needs to evolve to east-west non-blocking architectures to cope with network virtualization. If the "extra hops" are really a problem, we need to be clear on why those are problem, not just 'because it looks ugly on a drawing'. If the latency is low and the bandwidth non-blocking, why are extra *physical* hops bad? Just playing devils advocate (kinda) ;-)

    ReplyDelete
  2. Just a thought, how robust is this technology when sites are failing over and chaos rules. Seems like an outage could cause a disconnect for software to recover in a robust fashion. All for Network Virtualization, need to experiment here. 8-)

    ReplyDelete
  3. One of the biggest drawbacks to virtual appliance versions of load balancers (or, lovingly, application delivery controllers) is the lack of SSL crypto hardware. Most load balancers today employ some type of SSL ASIC to handle the cryptographically intensive asymmetric encryption (RSA) that occurs at the start of any SSL/TLS connection.

    Intel recently added AES-NI to its server processor lineup (it's in the new E7's and 5600 series Xeon), however they only handle symmetric, not asymmetric.

    So as Ivan said, it's going to chew up a lot more CPU cycles than would otherwise be chewed.

    ReplyDelete
  4. Well, I wouldn't use VXLAN (or any other L2 technology) between data centers. It's a nice mechanism to implement many virtual segments within a single failure domain (availability zone), if you want to go beyond that, you need proper application architecture.

    ReplyDelete
  5. The "only" reasons I dislike spaghetti-like virtual flows are:

    (A) troubleshooting complexities
    (B) increased network utilization

    I don't really care about N/S shifting to E/W. That's happening anyway and needs to be solved, but wasting bandwidth is a different story.

    Of course, if you have too much bandwidth in your DC and too many CPU cycles (so you can do routing in VM appliances), you might not care.

    ReplyDelete
  6. Interesting comment, especially in light of how much my System Admins would love the same subnets at both our data centers. Is there a good solution for allowing hosts to migrate between data centers that don't share layer-2 adjacency via any technology (VLAN, VxLAN, etc)? Maybe LISP?

    ReplyDelete
  7. Michal Zawirski12 October, 2011 13:48

    Ivan, while I (and probably 99% of network engineers) dislike spaghetti flows exactly for the reasons you mentioned, I agree with Brad’s point here. In virtualized / cloud environments we are going to see fewer and fewer “clean” designs (as depicted on the left side of your diagram), with well separated roles aligned with the physical network topology. The network paths should be deterministic (moving virtual appliances around as load changes => not necessarily a good idea) and performance (incl. latency) needs to be kept under control, but otherwise I would not care about the number of physical hops.

    I think we’re more likely to see a shared/virtualized pool of physical appliances (loadbalancers with SSL, firewalls, etc), connected to the “network fabric” somewhat like service linecards in a 6500 chassis (and hopefully supporting VXLAN termination natively at some point to avoid the L2 issues you described).

    Still, VXLAN termination in hardware may help keep the spaghetti slightly less convoluted.

    ReplyDelete
  8. Read this post first http://blog.ioshints.info/2011/09/long-distance-vmotion-for-disaster.html, and discuss the cost issue with the server admins ;)

    Once implemented properly, LISP will solve the IP address mobility problem, but not all the others.

    ReplyDelete
  9. Gr8 post , thanks ivan

    ReplyDelete
  10. According to Cisco insiders, ASA Firewall will implement VXLAN to VLAN Gateway mecanisms.
    But when, and on which type of ASA ?

    ReplyDelete
    Replies
    1. I would suggest you ask the above-mentioned Cisco insiders.

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.