Building the Network Automation Source of Truth

This is one of the “thinking out loud” blog posts as I’m preparing my presentation for the Building Network Automation Solutions online course. I’m probably missing a gazillion details - your feedback would be highly appreciated

One of the toughest challenges you’ll face when building a network automation solution is “where is my source of truth” (or: what data should I trust). As someone way smarter than me said once: “You could either have a single source of truth of many sources of lies”, and knowing how your devices should be configured and what mistakes have to be fixed becomes crucial as soon as you move from gathering data and creating reports to provisioning new devices or services.

The first step on your journey should be building reliable device inventory - if you have no idea what devices are in your network, you cannot even consider automating the network deployment or operations. Don’t even try to use Ansible or a similar tool to get it - there are tons of open-source and commercial network discovery tools out there, and every decent network management system has some auto-discovery functionality, so finding the devices shouldn’t be a big deal.

Now for the fun part: assuming you didn’t decide to do a one-off discovery to populate the device inventory, will you trust the data in the network management system, or will you migrate the data to some other database (IPAM/CMDB software like NetBox immediately springs to mind) and declare that database your source of truth… as far as device inventory goes.

In any case, your network automation tool (Ansible, Chef, Puppet, Salt, Nornir… isn’t it wonderful to have so many choices?) expects to get device inventory in its own format, which means that you have to export data from your chosen source of truth into the format expected by your tool, unless of course, you believe that a bunch of YAML or JSON files stored in semi-random places and using interestingly convoluted inheritance rules is the best possible database there is.

If you decide to use text files as your database and Notepad as your UI, go with YAML. Regardless of all the complaints you might be hearing on Twitter, it’s still easier to read than JSON.

Should you do the export/translation every time you run your automation tool, periodically, whenever something changes, or what? Welcome to one of the hard problems of computer science. I’ll try to give you a few hints in an upcoming blog post (as well as tackle another challenge: is your device configuration your source of truth and why is that a bad idea).

In case you want to know more:

You can get access to the webinars with Standard Subscription and access to the network automation online course with Expert Subscription.

Latest blog posts in Single Source of Truth (SSoT) in Network Automation series


  1. Does it make sense to use an "abstraction" layer for the stored data?
    For example, if my single source of truth is a CMBD system (which is not operated by the networking team). Most of the time (I guess) the export functionality of these tools is very static - meaning the format and content is dictated by the vendor. Maybe the vendor changes the exporting/database format with software updates and the backend scripts (ansible etc.) no longer work correctly.

    The question is if it's advisable to create an own database or data structure format (e.g. json) and populate this by external tools and scripts (Cisco Prime, CMDB). Backend tools (like ansible, other NMS systems etc.) can use the abstraction layer format to access the device list. In case the single source of truth changes, only one script (Source -> Abstraction format) must be adjusted and not all backend scripts... (just thinking loud) ...
    1. Sounds to me like it depends. It depends on the scope of tools and scale of the tooling.

      If, for example, it's mostly homebrew tools that all run on a single box, I don't think it makes sense to create an entire layer. You could simplify and just write a library that does the conversion that you then call from whatever tools you want. I'd call that less of a layer and more a good application of the DRY principle of development.

      However, if it's a large enterprise with multiple tools on multiple servers, that sort of data conditioning step could be useful, especially if device inventory and device status (or other metrics) are in different tools. That sounds more similar to the data science construct of ETL, which aims to get all the data into a usable format.

  2. Thank you for your blog. This is a particularly interesting topic for those that outgrow the use of simplistic methods of data files, whether they be YAML, CSV, XML, JSON, etc. At some point, one begins to realize that managing all of the data and relationships becomes beyond the means of "simple" files, and they should (need) go into the realm of using databases. I would submit that there are going to be folks that never outgrow the use of data-files, most likely due to the fact that they don't have a "big enough" network or a "complex enough" set of services. So to each their own, of course. But when someone *does* outgrow, then what ..., eh?

    You pointed out the need for at least an inventory source of truth; and I absolutely agree. But what about the "network application" source of truth? What I mean by this is the following. I believe that every network service, considered as a whole, is as a distributed application. While people outside the network industry think of "the network" as infrastructure, really the configurations in place create a service; and that service is more akin to the characteristics of an application than of infrastructure. For example, if one is building a datacenter to provide EVPN-VXLAN services, then they need to build/buy/borrow a network automation tool to support that application. That same application will not be suitable to managing WAN, or campus/branch, etc. So from an application point of view, I would submit there needs to be network services "source of truth" databases. Would you agree? and if so is this a concept you are teaching as of yet?
    1. Quick answers to the last two questions: YES and YES ;)
  3. Disclaimer: I work for Anuta Networks, a network automation vendor.

    I don't mean to promote the product, but in case of our Anuta ATOM software, we have the following approach, let me know if you see a problem with this:

    1. Just like all other tools, ATOM uses CDP, LLDP to discover the devices and build the topology.

    2. ATOM then reads the configs from all vendor devices and normalizes them into a unified data model (JSON or XML format) using abstraction. These JSON objects can be manipulated via API.

    3. ATOM then periodically reconciles with the underlying infrastructure. If the config changed manually, ATOM can restore original config back to the devices.

    Few of our customers like F5 Silverline are using ATOM as the single source of truth for network configurations and analytics data.

    This works across multiple vendors and multiple-domains (Campus, WAN, DC, MPLS core etc).
Add comment