Following my IBGP or EBGP in an enterprise network post a few people have asked for a more graphical explanation of IBGP/EBGP differences. Apart from the obvious ones (AS path does not change inside an AS) and more arcane ones (local preference is only propagated on IBGP sessions, MED of an EBGP route is not propagated to other EBGP neighbors), the most important difference between IBGP and EBGP is BGP next hop processing.
It’s best to explain BGP next hop processing through a set of examples; mine will be based on the following small network:
And since this post is getting way too long, here’s a rough table of content:
- Originating BGP routes
- Route reflectors
- EBGP rules and forwarding optimization
- IBGP rules and design guidelines
When a router originates a BGP route configured with a network router configuration command or through route redistribution (redistribute router configuration command), it sets the BGP next hop to the IGP next hop (the same value you’d find in the IP routing table). BGP next hop is set to 0.0.0.0 for routes with unknown next hops – connected interfaces, static routes to null 0 or summary routes configured with aggregate-address router configuration command.
You can set the BGP next hop of a locally originated BGP route to any value you like with a route-map applied to network, redistribute or aggregate-address router configuration command. But remember: just because you could doesn’t mean that you should.
When a BGP route with missing next hop is sent to BGP neighbors, the BGP next hop is set to the source IP address of the BGP session.
Example: PE-A originates BGP prefix 10.0.1.0/24 based on a static route to null 0. When it sends this BGP prefix to X1 and X2, BGP next hop is set to 192.168.0.3. BGP next hop in update sent to RR is 10.0.0.1.
If you use common BGP design recipes (IBGP sessions configured between loopback addresses and EBGP sessions configured across directly-connected subnets), and the BGP next hop is unknown, the BGP router advertises its loopback address as BGP next hop on IBGP sessions, making BGP table resilient to topology changes inside your network.
For routes with known next hops, the router applies standard IBGP/EBGP next hop processing rules (see below) when sending the BGP updates to its neighbors.
Large autonomous systems use BGP route reflectors. BGP route reflectors cannot change any attribute of the routes they reflect. The BGP next hop advertised by an edge router is thus propagated unchanged across the whole AS.
Updated 2015-11-30: recent IOS releases allow setting next-hop to self on reflected routes with the neighbor next-hop-self all configuration command. See Changes in IBGP Next Hop Processing blog post for more details.
Exception: you can change BGP next hop on a route reflector with an inbound route-map. Don’t do this outside of a CCIE lab.
Example: Prefix 10.0.1.0/24 originated by PE-A is propagated by RR to PE-B. BGP next hop is still 10.0.0.1.
The internal details of an AS should not influence packet forwarding between autonomous systems (and we cannot assume that a router external to our AS would know our internal details). The BGP next hop is thus changed to router’s own IP address (source address of the EBGP session) in outgoing EBGP updates.
Example: When PE-B sends BGP prefix 10.0.1.0/24 (with next-hop 10.0.0.1) to X3, it sets BGP next hop to 192.168.2.1.
You can always set the BGP next hop to any value you like with an outbound route-map. Risky (because it’s hard to check whether the next hop you advertise is actually reachable), but ensures pretty decent good job security.
EBGP next hop is not changed if the BGP next hop in the BGP table belongs to the same IP subnet as the EBGP neighbor to which the update is sent. This rule ensures optimum packet forwarding in partially-meshed EBGP deployments (example: internet exchange points).
Example: X1 sends BGP prefix 172.16.0.0/16 to PE-A. Next hop is set to the source address of the EBGP session between X1 and PE-A (192.168.0.1). When PE-A propagates the BGP prefix to X2, it does not change the next hop (X1, PE-A and X2 are in the same subnet).
You can disable the EBGP next hop optimization with neighbor next-hop-self router configuration command. This command is particularly useful in partially meshed multi-access networks (Frame Relay, ATM, Phase 1 DMVPN, private VLANs), see Using BGP in Phase 1 DMVPN Networks post for more details.
Example: Assuming neighbor 192.168.0.2 next-hop-self is configured on PE-A, the BGP next hop of all BGP routes sent to X2 from PE-A will be 192.168.0.3 and the traffic between X1 and X2 will flow through PE-A.
All routers within an autonomous system are assumed to be able to reach the same set of subnets (advertised through IGP). Consequently, when an AS edge router propagates external BGP prefixes to internal BGP peers, it does not change the BGP next hop.
The only exception to this rule is a router doing load balancing across multiple EBGP paths (as configured with maximum-paths configuration command). In that case, the BGP next hop is set to router's IP address on IBGP updates so the IBGP peers send the traffic to the originating router which can then do EBGP load balancing (added on 2019-09-19 based on input from Denis).
Example: X1 sends BGP prefix 172.16.0.0/16 with next hop 192.168.0.1 to PE-A. When PE-A propagates that prefix to RR, the BGP next hop is still 192.168.0.1. When the same prefix is reflected to PE-B, the next hop is still unchanged. PE-B therefore needs IGP path toward 192.168.0.0/24 or it cannot forward the traffic toward 172.16.0.0/16.
You could make BGP next hops reachable via BGP paths. While it might work, don’t do this at home (or in your production network).
As with EBGP sessions, you can force the AS edge router to become BGP next hop by using neighbor next-hop-self router configuration command on all IBGP sessions (I would usually use an IBGP peer session and peer policy template to simplify my configuration).
Example: X1 sends BGP prefix 172.16.0.0/16 with next hop 192.168.0.1 to PE-A. Assuming neighbor 10.0.0.2 next-hop-self has been configured on PE-A, the BGP next hop of the BGP route sent to RR will be 10.0.0.1.
You can design IBGP in your autonomous system in two fundamentally different ways:
- IBGP routes point to external BGP next hops (default behavior)
- IGBP routes point to loopback interfaces of AS edge routers (next-hop-self is configured on IBGP sessions on AS edge routers).
If you don’t change the BGP next hop on AS edge routers, you have to propagate external subnets with your IGP. You can either configure external subnets as passive interfaces or redistribute them into your IGP. The two methods are almost identical if you use IS-IS; OSPF is a slightly different story. Flap of a passive OSPF interface causes full SPF run, whereas addition or removal of an external route (type-5 or type-7 LSA) results in partial SPF run. Redistribution of external subnets is thus preferred if you use OSPF.
However, it’s never a good idea to allow external events (like link flaps in your access network) to influence the stability of your core IGP. Using next-hop-self on AS edge routers (and changing the external next hops into edge router’s loopback address) is thus almost always the preferred design.
Our professional services team has designed numerous very large BGP-based networks. Get in touch if you need me (or one of our experts) for a few day on-site network design/review workshop; the ExpertExpress option might be the right choice for smaller-scale challenges.