This blog post was initially sent to subscribers of my SDN and Network Automation mailing list. Subscribe here.
Here’s a question I got from one of the attendees of my network automation online course:
We had a situation where HSRP was configured on two devices and then a second change was made to use a different group ID. The HRSP mac address got "corrupted" into one of devices and according to the vendor FIB was in an inconsistent state. I know this may be vendor specific but was wondering if there is any toolkit available with validation procedures to check if FIB is consistent after implementing L3 changes.
The problem is so specific (after all, he’s fighting a specific bug) that I wouldn’t expect to find a generic tool out there that would solve it.
As always, I hope I’m wrong and someone will correct me (write a comment). Also, some major vendors started selling assurance engines – reassuringly-priced software that validates that their other reassuringly-priced solutions work correctly.
Ignoring that, what you could do in a situation like this is:
- Figure out how to identify the problem with show commands (assuming it can be done) and how to fix it when you find it (reload might be the only option);
- Write a script to use those show commands to check whether the forwarding state is still consistent with your expectations;
- Run that script periodically and do something when it detects the inconsistency;
… assuming, of course, that the problem is bad enough that it warrants the time and effort needed to write such a script.
Note: when evaluating whether it makes sense to invest time into writing a validation script, keep in mind that it will be a major effort when you start, but once you have the infrastructure in place it will be pretty easy to add further validation checks. I created a sample validation framework (feel free to use and extend it) as a case study for the Easy Wins module in the Building Network Automation Solutions online course.
Finally, scream and kick the vendor. Bugs are to be expected but having to write custom scripts to check whether the $vendor bloatware messed it up (again) instead of getting a quick bug fix is inexcusable.