Your browser failed to load CSS style sheets. Your browser or web proxy might not support elliptic-curve TLS

Building network automation solutions

9 module online course

Start now!
back to overview

Use Network Automation to Detect Software Bugs

This blog post was initially sent to subscribers of my SDN and Network Automation mailing list. Subscribe here.

Here’s a question I got from one of the attendees of my network automation online course:

We had a situation where HSRP was configured on two devices and then a second change was made to use a different group ID. The HRSP mac address got "corrupted" into one of devices and according to the vendor FIB was in an inconsistent state. I know this may be vendor specific but was wondering if there is any toolkit available with validation procedures to check if FIB is consistent after implementing L3 changes.

The problem is so specific (after all, he’s fighting a specific bug) that I wouldn’t expect to find a generic tool out there that would solve it.

As always, I hope I’m wrong and someone will correct me (write a comment). Also, some major vendors started selling assurance engines – reassuringly-priced software that validates that their other reassuringly-priced solutions work correctly.

Ignoring that, what you could do in a situation like this is:

  • Figure out how to identify the problem with show commands (assuming it can be done) and how to fix it when you find it (reload might be the only option);
  • Write a script to use those show commands to check whether the forwarding state is still consistent with your expectations;
  • Run that script periodically and do something when it detects the inconsistency;

… assuming, of course, that the problem is bad enough that it warrants the time and effort needed to write such a script.

Note: when evaluating whether it makes sense to invest time into writing a validation script, keep in mind that it will be a major effort when you start, but once you have the infrastructure in place it will be pretty easy to add further validation checks. I created a sample validation framework (feel free to use and extend it) as a case study for the Easy Wins module in the Building Network Automation Solutions online course.

Facebook used a similar approach when dealing with memory leaks in high-end routers – I talked about that in more details in the automated remediation part of Network Automation 101 webinar.

Finally, scream and kick the vendor. Bugs are to be expected but having to write custom scripts to check whether the $vendor bloatware messed it up (again) instead of getting a quick bug fix is inexcusable.

Please read our Blog Commenting Policy before writing a comment.

1 comment:

  1. I'm running a Poc of IP Fabric https://ipfabric.io/ on a campus network with >200 switches. Their multivendor validation engine does exactly what you describe, it runs show commands and verify the status. We noticed a few switch misconfigurations in a matter of minutes.

    From some initial tests I had a positive feedback and their dev team added features and fixed some minor issues in a few days. Try that with $BIG_VENDOR products, they always need a business case from a $BIG_CUSTOMER to engage the dev team.

    For a very specific need, like validating and maintaining a campus network, huge NMS/IBN tools may not be the best solutions. It may be worth investing some time looking for less mainstream products that solve real problems without major investments.

    Or write your own script if you have the skills and the time to maintain it ;-)

    Software is the only thing you buy knowing it will be flawed and you can't expect the vendor to fix. If it is broken it is the buyer's right to demand that it be repaired, it's valid for everything but software.

    ReplyDelete

Constructive courteous comments are most welcome. Anonymous trolling will be removed with prejudice.

Sidebar