A while ago I wrote that the hypervisor vendors should consider turning the virtual switches into PE-routers. We all know that’s never going to happen due to religious objections from everyone who thinks VLANs are the greatest thing ever invented and MP-BGP is pure evil, but there are at least two good technical reasons why putting MPLS/VPN (as we know it today) in the hypervisors might not be the best idea in very large data centers.
Please remember that we’re talking about huge data centers. If you have a few hundred physical servers, bridging (be it with VLANs or vCDNI) will work just fine.
This blog post was triggered by an interesting discussion I had with Igor Gashinsky during the OpenFlow Symposium. Igor, thank you for your insight!
Brief recap of the initial idea (you should also read the original blog post): hypervisor hosts should become PE-routers and use MP-BGP to propagate IP- or MAC-address reachability information. Hypervisor hosts implementing L3 forwarding could use RFC 4364 (with host routes for VMs), L2 hypervisor switches could use BGP MPLS Based MAC VPN.
And here are the challenges:
Scalability. MPLS/VPN requires a Label Switched Paths (LSP) between PE-routers. These paths could be signaled with LDP, in which case host routes to all PE-routers must be propagated throughout the network, or with MPLS-TE, in which case you have a full mesh (N-square) of tunnels and way too much state in the network core.
MPLS/VPN could also use IP or GRE+IP transport as defined in RFC 4023, in which case the scalability argument is gone.
MPLS/VPN requires flat address space, IP offers self-similar
aggregation capabilities (source: Wikipedia)
Eventual consistency of BGP. BGP was designed to carry humongous amount of routing information (Internet IPv4 routing table has more than 400000 routes), but it’s not the fastest-converging beast on this planet, and it has no transactional consistency. That might be fine if you’re starting and shutting down VMs (the amount of change is limited, and eventual consistency doesn’t matter for a VM going through the OS boot process), but not if you’re moving thousands of them in order to evacuate racks scheduled for maintenance.
Summary: MPLS/VPN was designed for an environment with a large number of routes and limited amount of routing information churn. Large-scale data centers offering “TCP clouds” (because some customers think that might result in high availability) just might be too dynamic for that.
Do we still need MPLS/VPN in the Data Center?
Sure we do, but not in the hypervisors. In many cases, we have to provide path isolation to the applications that don’t actually need L2 connectivity because they were written by people who understood how IP works (example: you might want to keep MySQL database servers strictly isolated from web servers).
MPLS/VPN is a perfect solution for that problem (Easy Virtual Networking might also work), but many engineers still use VLANs (even though L2 connectivity is not required) and risk the stability of their network because they’re not familiar with MPLS/VPN or because the gear they use doesn’t support it.