The whole High Availability Switching series started with a question along the lines of “does it make sense to run BFD together with Graceful Restart”. After Non-Stop Forwarding 101, Graceful Restart 101, and Graceful Restart and Convergence Speed we finally have enough information to answer that question.
TL&DR: Most probably not.
A more nuanced answer depends (as always) on a gazillion implementation details.
BFD implemented in forwarding hardware. This is the best option – BFD detects data plane failures, and routing protocol(s) detect control plane failures. BFD failure should trigger regular routing protocol convergence, and routing protocol timeouts should trigger Graceful Restart procedures.
BFD sharing fate with the control plane. A control plane failure (which would trigger Graceful Restart) would also result in BFD session failure. BFD failure could be used to enter Graceful Restart procedure (and start the Restart Timer) before the routing protocol detects a neighbor failure. However, BFD failure should not be used to flush the forwarding tables or start the routing protocol convergence.
You’ll find more details in Generic Application of BFD (RFC 5882).
Moving from Theory to Practice
If you insist on using BFD with Graceful Restart, get reliable answers to these questions (or do the tests yourself):
- Can the helper nodes decouple BFD and routing protocol failure detection and start an unconditional convergence or Graceful Restart as needed?
- Is the behavior following a BFD failure configurable?
- Does the helper node use the Control Plane Independent bit in BFD control messages to change its behavior?
I tried to find out the implementation details of Graceful Restart and BFD interactions. The closest I got was:
- This Junos document which totally confused me
- Cisco IOS XE BGP Configuration Guide saying “Configuring both Bidirectional Forwarding Detection (BFD) and BGP graceful restart for NSF on a device running BGP may result in suboptimal routing.” which supports my TL&DR conclusions ;)
- An Arista EOS document (behind a regwall) effectively saying “Our Stateful Switchover is fast enough that a BFD session doesn’t go down. You can therefore use BFD with BGP Graceful Restart.”
Hands-on experience would be highly appreciated – please write a comment!