Your browser failed to load CSS style sheets. Your browser or web proxy might not support elliptic-curve TLS

Building network automation solutions

9 module online course

Start now!
back to overview

Virtual Appliance Routing – Network Engineer’s Survival Guide

Routing protocols running on virtual appliances significantly increase the flexibility of virtual-to-physical network integration – you can easily move the whole application stack across subnets or data centers without changing the physical network configuration.

Major hypervisor vendors already support the concept: VMware NSX Edge Services Router can run OSPF, BGP or IS-IS, and BGP is coming to Hyper-V gateways. Like it or not, we’ll have to accept these solutions in the near future – here’s a quick survival guide.

Don’t use link-state routing protocols

Link-state routing protocols rely on shared topology database flooded between participating nodes (routers). The whole link state domain is a single trust zone – a single node going bonkers can bring down the whole domain.

Conclusion: don’t use link-state routing protocols between mission-critical physical network infrastructure and virtual appliances. BGP is the only safe choice.

EBGP or IBGP?

I would usually recommend running EBGP between your network and a third-party appliance, but IBGP might turn out to be simpler in this particular case:

  • IBGP sessions are multihop by default. We have yet to see whether virtual appliances support multihop EBGP sessions, and you probably wouldn’t want to establish peering between ToR switches and virtual apliances (see below);
  • IBGP sounds more complex (you need route reflectors), but it’s usually perfectly OK to advertise the default route to the virtual appliance … or you might decide to use DHCP-based default routing in which case you don’t’ have to send any information to the virtual appliance.
  • IBGP allows you to use MED and local preference to influence route selection if necessary.

Peer with a cluster of route servers

BGP configuration of a virtual appliance or a physical network device shouldn’t have to change when the application stack fronted by the virtual appliance moves into a different subnet. The virtual appliances should therefore peer with route servers using fixed neighbor IP addresses.

Here’s an anycast design that ensures a virtual appliance always finds a path to a route server regardless of where it’s moved to:

  • Assign the same IP address to loopback interfaces of multiple BGP route servers and advertise these addresses with varying IGP costs (or you might get interesting results when ECMP kicks in ;). Obviously you’d use two anycast IP addresses for redundancy.
  • When a virtual appliance establishes a session with the closest BGP route server, it announces its prefixes with the BGP next hop set to the physical IP address of the appliance. Assuming you run IBGP between your physical nodes, all routers in your data center get optimal routing information.

The route servers obviously have to accept BGP sessions coming from a range of IP addresses – dynamic BGP neighbors are a perfect solution.

If the routers or layer-3 switches you use don’t support dynamic BGP neighbors, use Cisco’s Cloud Services Router as a route server. It’s a bit more expensive than Quagga but also a bit more versatile.

Don’t trust, verify

You wouldn’t want just any VM that happens to be connected directly to a physical VLAN to have BGP connectivity to your route servers, would you? Use MD5 authentication on dynamic BGP sessions.

Likewise, you probably don’t want to accept routes at face value from untrusted nodes. Filter BGP updates received from virtual appliances, and accept only prefixes from specific address range assigned to virtual appliances having specific subnet size (for example, /64 in IPv6 world or /32 to /29 in IPv4 world).

Need help with your network design?

Check out my ExpertExpress service and BGP case studies that you get with the yearly subscription.

Please read our Blog Commenting Policy before writing a comment.

1 comment:

  1. Quite interesting stuff as usual on your blogs. Albeit I agree with what you write I would qualify it as "don't use _today's_ IGPs in this situation". Let's see however _what_ one would ideally want from the routing if it stretches to your server (I use the terms loosely, can be kubernetes address, can be real addresses, can be HV flavor). I do think it would be very desirable BTW compared to the "out-of-band" solutions like IBGP-RRs or "leaking of addresses" where each server is a "domain" but ultimately all those domains need to be synchronized by the main routing again so the servers see each other. The more "degress of separation" the more fragile and slow in convergence the solution always is.

    So I think req's list reads like this roughly:

    * contain blast radius, i.e. ideally a server failure shakes only the minimum necessary set of fabric: here IGPs, if one runs /32-routing (which one must often do in case of e.g. mobility and one doesn't want to run DHCP on each ToR with carefully controlled ranges [own set of problems]) will generate the 'one address shakes e'one' problem or "whole fabric blast radius". So, instead of a flat IGP (where e'one needs replicating the LSDB BTW and one can't really summarize easily) we could try areas but that causes blackholes (since summaries will generate the problem of area ingress/egress on link failures). One could claim that "servers never fail" but today's reality of rolling updates on fabrics runs contrary to this assertion.
    * multi-homing and what I call "true anycast". One would want multi-homing and probably want to do it using two addresses. And ideally one would have the option to have even same address on multiple servers (service anycast) independent of ECMP really (i.e. anycast to multiple servers independent of metric)
    * only default route (weighted) on the servers
    * northbound metric balancing, i.e. the servers adjusting to failure of "fat links" and generally picking their ToRs with more capacity over less fat pipes. This is a coarse version of flow engineering but given the speed of flow changes and shifts in traffic patterns I am not a big believer in controllers being able to react to it anyway. A "bandwidth broker" on the server kernel is the best solution in a sense but a luxury few will be able to afford.
    * scale: it is really easy to blow up address space with servers running HVs, VMs and so on so one has to respect that and build a solution that can cope.
    * server security: I drop that blackhole for the moment, it will come up ;-)

    Hastily typed before a meeting, excuse less than perfect English ;-)

    ReplyDelete

Constructive courteous comments are most welcome. Anonymous trolling will be removed with prejudice.

Sidebar