… updated on Wednesday, February 1, 2023 13:35 UTC
Virtual Appliance Routing – Network Engineer’s Survival Guide
Routing protocols running on virtual appliances significantly increase the flexibility of virtual-to-physical network integration – you can easily move the whole application stack across subnets or data centers without changing the physical network configuration.
Major hypervisor vendors already support the concept: VMware NSX-T edge nodes can run BGP or OSPF1, and Hyper-V gateways can run BGP. Like it or not, we’ll have to accept these solutions in the near future – here’s a quick survival guide.
Don’t Use Link-State Routing Protocols
Link-state routing protocols rely on shared topology database flooded between participating nodes (routers). The whole link state domain is a single trust zone – a single node going bonkers can bring down the whole domain.
Conclusion: don’t use link-state routing protocols between mission-critical physical network infrastructure and virtual appliances. BGP is the only safe choice.
Peer With a Cluster of Route Servers
BGP configuration of a virtual appliance or a physical network device shouldn’t have to change when the application stack fronted by the virtual appliance moves into a different subnet. The virtual appliances should therefore peer with route servers using fixed neighbor IP addresses2.
Here’s an anycast design that ensures a virtual appliance always finds a path to a route server regardless of where it’s moved to:
- Assign the same IP address to loopback interfaces of multiple BGP route servers and advertise these addresses with varying IGP costs (or you might get interesting results when ECMP kicks in ;). Obviously you’d use two anycast IP addresses for redundancy.
- When a virtual appliance establishes a session with the closest BGP route server, it announces its prefixes with the BGP next hop set to the physical IP address of the appliance. Assuming you run IBGP between your physical nodes, all routers in your data center get optimal routing information.
The route servers obviously have to accept BGP sessions coming from a range of IP addresses – dynamic BGP neighbors are a perfect solution.
For even more details, read the Running BGP between Virtual Machines and Data Center Fabric blog post.
EBGP or IBGP?
I would usually recommend running EBGP between your network and a third-party appliance, but IBGP might turn out to be simpler in this particular case:
- If you want to peer with a cluster of route servers you need multihop BGP sessions3. IBGP sessions are multihop by default, and some virtual appliances/gateways might not support multihop EBGP sessions4.
- IBGP sounds more complex (you need route reflectors), but it’s usually perfectly OK to advertise the default route to the virtual appliance … or you might decide to use DHCP-based default routing in which case you don’t have to send any information to the virtual appliance.
- IBGP allows you to use MED and local preference to influence route selection if necessary.
Don’t Trust, Verify
You wouldn’t want just any VM that happens to be connected directly to a physical VLAN to have BGP connectivity to your route servers, would you? Use MD5 authentication on dynamic BGP sessions.
Likewise, you probably don’t want to accept routes at face value from untrusted nodes. Filter BGP updates received from virtual appliances, and accept only prefixes from specific address range assigned to virtual appliances having specific subnet size (for example, /64 in IPv6 world or /32 to /29 in IPv4 world).
Need More Information?
- VMware NSX Technical Deep Dive webinar includes a deep dive into BGP routing between NSX-T edge nodes and physical fabric.
- Amazon Web Services Networking webinar describes BGP routing between AWS and external destinations as well as BGP routing between AWS transit gateways and virtual appliances.
- Microsoft Azure Networking describes BGP routing between Microsoft Azure and external destinations as well as BGP routing between Azure Route Server (or Virtual WAN) and virtual appliances.
Need Help with Your Network Design?
Check out my BGP case studies that you get with the yearly subscription.
Revision History
- 2023-02-01
- Added several reference to NSX-T
- Explained the difference between routing with virtual appliances and bare-metal edge nodes
- Streamlined the discussion with reordered sections
-
For a trip down the memory lane, read the Routing Protocols on NSX Edge Services Router blog post that describes VMware NSX-V implementation. ↩︎
-
VMware NSX-T solves this with preconfigured BGP sessions between bare-metal edge nodes and adjacent ToR switches. There are no BGP sessions between virtual appliances and physical world – IP prefixes of all applications stacks using an edge node are advertised by that edge node. ↩︎
-
Running direct EBGP sessions between bare-metal VMware NSX-T edge nodes and adjacent ToR switches is obvious perfectly fine ↩︎
-
OTOH, some virtual appliances or virtual network edge nodes might not support IBGP. Sometimes you just can’t win. ↩︎
So I think req's list reads like this roughly:
* contain blast radius, i.e. ideally a server failure shakes only the minimum necessary set of fabric: here IGPs, if one runs /32-routing (which one must often do in case of e.g. mobility and one doesn't want to run DHCP on each ToR with carefully controlled ranges [own set of problems]) will generate the 'one address shakes e'one' problem or "whole fabric blast radius". So, instead of a flat IGP (where e'one needs replicating the LSDB BTW and one can't really summarize easily) we could try areas but that causes blackholes (since summaries will generate the problem of area ingress/egress on link failures). One could claim that "servers never fail" but today's reality of rolling updates on fabrics runs contrary to this assertion.
* multi-homing and what I call "true anycast". One would want multi-homing and probably want to do it using two addresses. And ideally one would have the option to have even same address on multiple servers (service anycast) independent of ECMP really (i.e. anycast to multiple servers independent of metric)
* only default route (weighted) on the servers
* northbound metric balancing, i.e. the servers adjusting to failure of "fat links" and generally picking their ToRs with more capacity over less fat pipes. This is a coarse version of flow engineering but given the speed of flow changes and shifts in traffic patterns I am not a big believer in controllers being able to react to it anyway. A "bandwidth broker" on the server kernel is the best solution in a sense but a luxury few will be able to afford.
* scale: it is really easy to blow up address space with servers running HVs, VMs and so on so one has to respect that and build a solution that can cope.
* server security: I drop that blackhole for the moment, it will come up ;-)
Hastily typed before a meeting, excuse less than perfect English ;-)