My good friend Tiziano complained about the fact that BGP considers next hop reachable if there’s an entry in the IP routing table even though the router cannot even ping the next hop.
That behavior is one of the fundamental aspects of IP networks: networks built with IP routing protocols rely on fate sharing between control and data planes instead of path liveliness checks.
Fate Sharing 101
In networks where control and data plane share the same fate, the routing protocols use the same data plane (same links and interfaces) to exchange hello packets or routing protocol messages as the traffic forwarded based on the information collected by the routing protocol.
The underlying assumption is simple: if the routing protocol manages to exchange information between adjacent nodes, there won’t be any problems with the user traffic. Furthermore, if the routing protocol updates in a distance vector protocol made it from egress router to ingress router, the traffic should experience no problems when being sent in the reverse direction.
The same conclusion can be reached for link state protocols running over point-to-point links. The proof is left as an exercise for the reader.
Fate Sharing and BGP
Like any other IP routing protocol, BGP relies heavily on fate sharing to optimize its performance:
- EBGP sessions are supposed to be established between directly-connected interfaces, resulting in perfect fate sharing.
- IBGP sessions are established between non-adjacent routers, but BGP relies on underlying IGP for next-hop forwarding information, and assumes the underlying IGP has fate sharing properties (in other words: if the BGP next hop is reachable through an entry in the IP routing table, it’s safe to use it).
The behavior described by Tiziano is thus not a bug, but a FAD (Functions as Designed).
Before we start the discussion whether the default route should or should not be a viable path toward a BGP next hop, read the BGP Support for Next-Hop Address Tracking document (yeah, I know, another nerd knob with potentially unexpected default value).
How They Broke Fate Sharing
Fundamental principles should never stand in the way of a performance hack or a cool MacGyver kludge. Fate sharing fared no better.
The first routing protocol that broke fate sharing properties was OSPF – the early versions of OSPF assumed that all routers on a subnet (that is modeled as Type-2 LSA) can communicate even though all we can safely assume is that they can exchange OSPF hellos with the DR. The consequences of this faulty assumption are well known to anyone that had to troubleshoot OSPF over NBMA networks. Point-to-multipoint interfaces were introduced in RFC 2178 to fix this problem a few years later.
BGP is no better than OSPF. The default BGP next hop processing algorithm breaks fate sharing: a BGP router assumes two other routers in the same subnet can communicate if it’s able to establish BGP session with both of them.
To use strict fate sharing in a BGP network, you have to configure next-hop-self on all BGP sessions, which might result in pretty suboptimal traffic flow.
Speaking about third-party next hops – next hop address in OSPF type-5 LSA is another potential can of worms.
Multi-protocol routing over IS-IS is another example – IS-IS routers assume they can use a path across the topology graph for IPv4 and IPv6 even when only one of the protocols is configured on the actual link. That behavior was fixed with multi-topology IS-IS (RFC 5120) which introduced separate topology graphs for individual layer-3 protocols (IPv4 and IPv6).
Last but definitely not least, any network engineer who thinks end-to-end principle has nothing to do in his transport network can easily break fate sharing properties of IP routing protocols by adding stray static routes, packet filters, policy-based routing or NAT.
I described the problems of single-topology IS-IS and multi-topology IS-IS configuration in the Building Large IPv6 Service Provider Networks webinar.