1 comments:

  1. Data centre bgp has nothing specific attribute or packet type change in bgp as a protocol.

    DC-BGP is very specifc to bgp path hunting and PIB [Policy information base] for RIB/FIB lookups.

    1) iBgp or eBGP

    We use four tiers - T0, T1, T2,T3

    T0 - Leaf switches Dont peer with another leaf switch, the only peering is with T1 [ cluster spine ] .Nowadays these to layers are collapsed together to form spline network topology which reduces he usable tiers and Lao the layer latency.

    ibgp or eBGP between T0 and T1

    iBGP between T0 and T1 brings same ASN numbers across leaf and spine which is not preferred because

    T0-------------T1
    |
    |---------------T1

    T0 receives iBgp update of 0.0.0.0/0 a default route from all T1 with ibgp spilt horizon or the full mesh rule that it doesn't update to Upstream T1.
    iBgp peering has to be with directly connected IP Address and not loopback address as we need to run IGP for that and we don't require any IGP for that.

    Separating ASN numbers between each tiers allows us to reduce fault tolerant hotspots rather than inducing same ASN and that is the design principle behind preferring EBGP over IBGP

    Using eBGP for the same topology

    T0---------T1
    |
    |----------- T1

    T1 has eBGP sessions with T0 and each T0 will advertise default sent from T1 to another T1 and hence the reason all the spines maintain same ASN because of the inherent same ASN LOOP that prevents they update from T0 to another T1 not getting installed.

    If spine are part of different ASN then T1 will use T0 as transit to reach another T1 which should not be the forwarding topology.

    T0-------T1--------T2----------T3

    All the T0 part of single cluster [ 20 racks aka 20 T0's ] can be part of same ASN or different ASN.

    If all the T0 have unique ASN numbering, we have different eBGP peering between T1 and T0 install specific different rack prefix route in its RIB/FIB table.

    If we use same ASN numbering across all the T0 across the cluster will not receive the intra rack prefix updates.

    By this way we can influence RIB/FIB limitations in T0 . If we maintain just the default route from T1 then it will save TCAM space for FIB and use those ACLs in place for FIB routes.

    T2 is Data centre spine and T3 is Regional Spine and both are part of different ASNs.

    We use allowas-in on T1 because all the cluster spines are in the same ASN and update from one T1 would reach another T1 via T2 and to install multipathing prefixes via T2 to reach another T1 in case of link failure from T1 to T0.

    We use either default prefixes or specific prefixes on the fabric cluster.


    The new Spline design collapses Spine and leaf together which is the way to go.

    Running IGP ( ospf,Isis) on the tiers will make fabric too chatty, cumbersome state table maintanance , use T0 to reach another T1 from T1 and tough policy maintenance.

    Use BGP GRacefulshut community for easy policy maintenance .

    For more info join AzureFriday or use Azure products Iaas for your Enterprise network.

Add comment
Sidebar