Testing OSPF Device Configurations
A year ago, I described how we use the netlab validate command to test device configuration templates for most platforms supported by netlab. That blog post included a simple “this is how you test interface address configuration” example; now, let’s move to something a bit more complex: baseline OSPF configuration.
Testing the correctness of OSPF configurations seems easy:
- Build a lab with a test device and a few other OSPF devices
- Configure the devices
- Log into the test device and inspect OSPF operational data
There’s just a tiny little fly in this ointment…
It’s impractical to parse the show printouts from over a dozen different platforms, and the temperature in hell might drop considerably1 before every vendor implements the IETF (or OpenConfig) YANG data models2. The only way to deal with a multi-platform environment is to rely on the protocol behavior, figure out the behavior of the tested device from the adjacent probes, and hope that the probes work correctly (not always true).
For example, to test the activation of OSPF on individual interfaces:
- Connect a probe (P1) to an interface that should be activated
- Connect another probe (P2) to an interface that should not be activated.
- Wait for the OSPF adjacency between P1 and the device under test (DUT). If adjacency is not established within a reasonable timeframe3, assume the OSPF interface configuration on the DUT is faulty.
- Wait a bit longer4 (just in case, because we’re trying to prove a negative) and check the OSPF adjacency between P2 and DUT. An established adjacency points to a configuration error on the DUT.
While the above is the simplest possible test one can do, we don’t start with it. It takes ridiculously long (if you have to run hundreds of tests on dozens of platforms) for OSPF to establish an adjacency on broadcast networks (the typical default setting); it’s much faster to use point-to-point links5. Our first test is therefore testing the configuration of OSPF network types:
- Configure one link (P1-DUT) as an OSPF broadcast network and another link (P2-DUT) as an OSPF point-to-point link.
- Wait for both adjacencies to be established.
If a device passes the test, we know that (A) it can activate OSPF on interfaces (but we are not yet sure whether it does that correctly) and (B) that it sets the OSPF network type correctly (because a mismatch in OSPF network type prevents the adjacency from being established). After validating this baseline, we can use the OSPF point-to-point network type in all further OSPF tests, significantly reducing the overall validation time.
You can probably guess how the other tests work:
- The OSPF areas test configures probes in different areas and checks for OSPF adjacencies (an area mismatch prevents an adjacency from being established)
- The OSPF over unnumbered interfaces tests adjacency over interfaces with “borrowed” IP addresses
- The OSPF costs test configures various (unusual) costs on the DUT interfaces; the probes verify the end-to-end cost of the remote probe’s loopback, proving that the cost configured on the DUT interface matches the expected value.
- The OSPF timers test configures small values for the hello/dead intervals on all devices in the lab and checks for OSPF adjacencies (a timer mismatch also stops the adjacency from forming).
For even more details (or testing ideas), explore the OSPFv2 and OSPFv3 integration tests in the netlab GitHub repository.
-
Assuming hell is impacted by the usual laws of physics and the expansion of the universe. ↩︎
-
Assuming these data models contain the information we need, it is not always a given. ↩︎
-
We use 50 seconds as the worst-case scenario – 10 seconds to receive hello packets, 30 seconds to elect the DR/BDR, and an additional 10 seconds to complete the adjacency process. ↩︎
-
Getting the timing just right is a bit of a challenge. Sometimes the first adjacency is established surprisingly fast, and the second one takes way longer. You should wait an extra 2-3 hello intervals to be on the safe side. ↩︎
-
Most devices establish a point-to-point OSPF adjacency within a few seconds; sometimes, it’s established before we even start the validation phase. The broadcast adjacency pretty consistently takes around 40 seconds. You can check the validation logs for individual devices for more details; click the device name on the OSPFv2/OSPFv3 coverage report and then click the green checkmark in the validated column of the 01-network row. View the validation results of other OSPF tests for additional data points; most of them verify the P2P adjacency as the initial step in the validation process. ↩︎
Regarding "because a mismatch in OSPF network type prevents the adjacency from being established":
The OSPF network type is not checked when establishing an OSPF adjacency. Point-to-point and broadcast links form a full adjacency. This link is not used in shortest path calculation, i.e., the network does not work correctly.
Cisco describes this in Troubleshoot Open Shortest Path First Route Database Issues.
Thanks a million, will fix.