Blog Posts in August 2013
I spent the last few weeks blogging about the brave new overlay worlds. Time to return to VLAN-based physical reality and revisit one of the challenges of VM mobility: mobile IP addresses.
A while ago I speculated that you might solve inter-subnet VM mobility with Mobile ARP. While Mobile ARP isn’t the best idea ever invented it just might work reasonably well for environments with dozens (not millions) of virtual servers.
Routing protocols running on virtual appliances significantly increase the flexibility of virtual-to-physical network integration – you can easily move the whole application stack across subnets or data centers without changing the physical network configuration.
Major hypervisor vendors already support the concept: VMware NSX Edge Services Router can run OSPF, BGP or IS-IS, and BGP is coming to Hyper-V gateways. Like it or not, we’ll have to accept these solutions in the near future – here’s a quick survival guide.
Every time I mention overlay virtual networking tunnels someone starts worrying about the scalability of this approach along the lines of “In a data center with hundreds of hosts, do I have an impossibly high number of GRE tunnels in the full mesh? Are there scaling limitations to this approach?”
Not surprisingly, some ToR switch vendors abuse this fear to the point where they look downright stupid (but I guess that’s their privilege), so let’s set the record straight.
VMware gave me early access to NSX hands-on lab a few days prior to VMworld 2013. The lab was meant to demonstrate the basics of NSX, from VXLAN encapsulation to cross-subnet flooding, but I quickly veered off the beaten path and started playing with routing protocols in NSX Edge appliances.
Answer#1: An overlay virtual networking solution providing logical bridging (aka layer-2 forwarding or switching), logical routing (aka layer-3 switching), distributed or centralized firewalls, load balancers, NAT and VPNs.
Answer#2: A merger of Nicira NVP and VMware vCNS (a product formerly known as vShield).
Oh, and did I mention it’s actually two products, not one?
Unless you decided to live under a rock for the next 20 years or plan to drop out of networking in the very near future, you simply (RFC 2119) MUST watch this video.
A while ago Greg Ferro wrote a great article describing integration of overlay and physical networks in which he wrote that “an overlay network tunnel has no state in the physical network”, triggering an almost-immediate reaction from Marten Terpstra (of RIPE fame, now @ Plexxi) arguing that the network (at least the first ToR switch) knows the MAC and IP address of hypervisor host and thus has at least some state associated with the tunnel.
Marten is correct from a purely scholastic perspective (using his argument, the network keeps some state about TCP sessions as well), but what really matters is how much state is kept, which device keeps it, how it’s created and how often it changes.
During the UCS Director Overview Packet Pushers Podcast I listened to recently the participants started discussing the use cases and someone mentioned that UCS Director might not be applicable for small shops with only a few thousand VMs. Let's put that in perspective.
The “What’s coming in Hyper-V Network Virtualization (Windows Server 2012 R2)” blog post got way too long, so I had to split it in two parts: Hyper-V Network Virtualization and the rest of the features (this post).
In the previous posts I described how a typical overlay virtual networking data plane works and what technologies vendors use to implement the associated control plane that maps VM MAC addresses to transport IP addresses. Now let’s walk through the details of a particular implementation: Nicira Network Virtualization Platform (NVP), part of VMware NSX.
Before Tore Anderson, the rock star behind the IPv6-only data center, started explaining the interesting details of his ideas, I did a short intro explaining the need for IPv4+IPv6 access to your content and the steps you have to take to get there.
You might decide to proceed down the more traditional path (doing 5-6 transitions in the next few years) or deploy IPv6-only data center and be done with it.
A while ago Tomasz Kacprzynski asked me whether I'd ever run RSVP over DMVPN. I hadn't - after all, you'd only need that in VoIP environments and I try to stay as far away from voice as possible.
In the meantime, Tomasz solved the problem (short summary: you have to turn Phase 3 DMVPN into Phase 2 DMVPN) and wrote a lengthy blog post describing the problem (RSVP split horizon rule) and his solution (including numerous debugging printouts). Definitely worth reading if there's a non-zero chance you'll have to get the two working together.
In a recent blog post Marten Terpstra wrote:
We are teaching our applications how to behave uniformly. Or normal. And that's not normal. We should teaching the network how to serve the applications instead. However demanding or quirky they decide to be.
That’s definitely a noble engineering goal, the “only” problem is that I don’t know many customers who would be willing to foot the bill.
Multiple overlay network encapsulations are nothing more than a major inconvenience (and religious wars based on individual bit fields close to meaningless) for anyone trying to support more than one overlay virtual networking technology (just ask F5 or Arista).
The key differentiator between scalable and not-so-very-scalable architectures and technologies is the control plane – the mechanism that maps (at the very minimum) remote VM MAC address into a transport network IP address of the target hypervisor (see A Day in a Life of an Overlaid Virtual Packet for more details).
Every single network device (or a distributed system like QFabric) has to perform at least three distinct activities:
- Process the transit traffic (that’s why we buy them) in the data plane;
- Figure out what’s going on around it with the control plane protocols;
- Interact with its owner (or NMS) through the management plane.
Routers are used as a typical example in every text describing the three planes of operation, so let’s stick to this time-honored tradition:
Right after Microsoft’s TechEd event CJ Williams kindly sent me links to videos describing new features in upcoming Windows Server (and Hyper-V) release. I would strongly recommend you watch What’s New in Windows Server 2012 R2 Networking and Deep Dive on Hyper-V Network Virtualization in Windows Server 2012 R2, and here’s a short(er) summary.
This blog post is describing futures that will ship in 2H2013. However, as all the videos mentioned above included live demos, and the preview release shipped on June 24th, it’s obvious they’re past the “it works so great in PowerPoint” stage.
Enterasys implemented optimal layer-3 forwarding with an interesting trick: they support VRRP like any other switch vendor, but allow you to make all members of a VRRP group active forwarders regardless of their status.
For more information, watch the Fabric Routing video from the Enterasys Robust Data Center Interconnect Solutions webinar.
I explain the intricacies of overlay network forwarding in every overlay-network-related webinar (Cloud Computing Networking, VXLAN deep dive...), but never wrote a blog post about them. Let’s fix that.
First of all, remember that most mainstream overlay network implementations (Cisco Nexus 1000V, VMware vShield, Microsoft Hyper-V) don’t change the intra-hypervisor network behavior: a virtual machine network interface card (VM NIC) is still connected to a layer-2 hypervisor switch. The magic happens between the internal layer-2 switch and the physical (server) NIC.
Andrew sent me the following question: “I'm pushing to start a conversation about IPv6 in my organization, but meanwhile I've no RFC 1918 space left. What's your take on 100.64.0.0/10 - it's seems like this is available for RFC 1918 purposes, even if not intentionally?”
Short answer: Don’t even think about that!
Microseconds after VXLAN was launched at VMworld 2011, someone started promoting it as a data center extension solution. Even though layer-2 DCI doesn’t make much sense (even to server people) and VXLAN is really not a DCI solution, the lure of misusing a technology was irresistible.