Data Center BGP: Autonomous Systems and AS Numbers
Two weeks ago we discussed whether it makes sense to use BGP as the routing protocol in a data center fabric. Today we’ll tackle three additional design challenges:
- Should you use IBGP or EBGP?
- When should you run BGP on the spine switches?
- Should every leaf switch have a different AS number or should they share the same AS number?
Recent posts in the same categories
design
- Hub-and-Spoke VPN on a Single PE-Router
- EVPN Designs: Scaling IBGP with Route Reflectors
- Migrating a Data Center Fabric to VXLAN
- EVPN Designs: IBGP Full Mesh Between Leaf Switches
- Repost: Think About the 99% of the Users
- EVPN Designs: VXLAN Leaf-and-Spine Fabric
data center
- Response: The Usability of VXLAN
- The Mythical Use Cases: Traffic Engineering for Data Center Backups
- Video: What Is Software-Defined Data Center
- Repost: L2 Is Bad
- Path Failure Detection on Multi-Homed Servers
- Video: Types of Switching ASICs
DC-BGP is very specifc to bgp path hunting and PIB [Policy information base] for RIB/FIB lookups.
1) iBgp or eBGP
We use four tiers - T0, T1, T2,T3
T0 - Leaf switches Dont peer with another leaf switch, the only peering is with T1 [ cluster spine ] .Nowadays these to layers are collapsed together to form spline network topology which reduces he usable tiers and Lao the layer latency.
ibgp or eBGP between T0 and T1
iBGP between T0 and T1 brings same ASN numbers across leaf and spine which is not preferred because
T0-------------T1
|
|---------------T1
T0 receives iBgp update of 0.0.0.0/0 a default route from all T1 with ibgp spilt horizon or the full mesh rule that it doesn't update to Upstream T1.
iBgp peering has to be with directly connected IP Address and not loopback address as we need to run IGP for that and we don't require any IGP for that.
Separating ASN numbers between each tiers allows us to reduce fault tolerant hotspots rather than inducing same ASN and that is the design principle behind preferring EBGP over IBGP
Using eBGP for the same topology
T0---------T1
|
|----------- T1
T1 has eBGP sessions with T0 and each T0 will advertise default sent from T1 to another T1 and hence the reason all the spines maintain same ASN because of the inherent same ASN LOOP that prevents they update from T0 to another T1 not getting installed.
If spine are part of different ASN then T1 will use T0 as transit to reach another T1 which should not be the forwarding topology.
T0-------T1--------T2----------T3
All the T0 part of single cluster [ 20 racks aka 20 T0's ] can be part of same ASN or different ASN.
If all the T0 have unique ASN numbering, we have different eBGP peering between T1 and T0 install specific different rack prefix route in its RIB/FIB table.
If we use same ASN numbering across all the T0 across the cluster will not receive the intra rack prefix updates.
By this way we can influence RIB/FIB limitations in T0 . If we maintain just the default route from T1 then it will save TCAM space for FIB and use those ACLs in place for FIB routes.
T2 is Data centre spine and T3 is Regional Spine and both are part of different ASNs.
We use allowas-in on T1 because all the cluster spines are in the same ASN and update from one T1 would reach another T1 via T2 and to install multipathing prefixes via T2 to reach another T1 in case of link failure from T1 to T0.
We use either default prefixes or specific prefixes on the fabric cluster.
The new Spline design collapses Spine and leaf together which is the way to go.
Running IGP ( ospf,Isis) on the tiers will make fabric too chatty, cumbersome state table maintanance , use T0 to reach another T1 from T1 and tough policy maintenance.
Use BGP GRacefulshut community for easy policy maintenance .
For more info join AzureFriday or use Azure products Iaas for your Enterprise network.