Running BGP between Virtual Machine and ToR Switch
One of my readers left this question on the blog post resurfacing the idea of running BGP between servers and ToR switches:
When using BGP on a VM for mobility, what is the best way to establish a peer relationship with a new TOR switch after a live migration? The VM won't inherently know the peer address or the ASN.
As always, the correct answer is it depends.
Supporting Live VM Mobility
If you want to support live (hot) VM mobility across ToR switches, don’t run BGP with the ToR switch. Regardless of how well you fine-tune the setup, it will take at least a few seconds before the BGP session with the new ToR switch is established, making your service inaccessible in the meantime.
As I explained in another blog post (yes, it’s almost exactly three years old), you SHOULD run a BGP session with a route server somewhere in your network, preferably using IBGP to make things simpler.
To add redundancy to the design, peer the VM with two route servers.
Supporting Physical Servers
If your servers don’t move, but you still don’t want to deal with neighbor IP addresses or AS numbers, use one or more of these tricks:
- Configure the same loopback address on all ToR switches (I wouldn’t advertise it into the network, and you definitely don’t want it to become the ToR switch router ID);
- Establish BGP session between the physical servers and the loopback address using either IBGP (so everyone is in the same AS number) or using local-as on the ToR switch to present the same AS number to all servers.
Deploying FRR on the servers is obviously a better option. For more details, watch the Leaf-and-Spine Fabric Designs webinar.
Supporting Disaster Recovery
Running BGP between the virtual machines and the network simplifies disaster recovery scenarios (and alleviates the need for crazy kludges like stretched VLANs). If this is your use case:
- Run a set of route servers in each data center to support live VM mobility within each data center;
- Use the same IP addresses and AS numbers across route servers in all data centers to enable VMs to connect to the route server in the local data center;
- Don’t advertise the shared IP addresses between data centers (you don’t want the VMs to connect to a route server in another data center due to a crazy routing glitch).
Need even more details?
We discussed them in the Leaf-and-Spine Fabric Designs webinar and in the Building Next-Generation Data Center online course.
I know, I know, and before you laugh and spurt coffee out your nose at the thought of such an old crusty IGP in a DC maybe think about it for a minute.
RIPv2 supports easy summarization and route filtering for the TOR. With reduced timers (and maybe BFD) convergence could come down to a second or three. ECMP works too. Oh, and it has very few nerd-knobs for the server guys to play with.
It just seems like a much simpler solution than BGP into the hypervisors & VMs.
I could see two drawbacks of using RIPv2:
* It has to be configured on every ToR switch (you can't use a route server);
* Per-neighbor route filtering (if you want to do that) could get interestingly complex.
If all you want to do is to collect whatever VMs are telling you, then obviously RIPv2 is the tool for the job.
Robust primary node election, on the other hand, is trickier problem, unless there is a global locking service available :)