Text Files or Relational Database?

This blog post was initially sent to subscribers of my SDN and Network Automation mailing list. Subscribe here.

One of the common questions I get once the networking engineers progress from Ansible 101 to large-scale deployments (example: generating configurations for 1000 devices) is “Can Ansible use a relational database? Text files don’t scale…”

TL&DR answer: Not directly, but there are tons of database Ansible plugins or custom Jinja2 filters out there.

A Quick Detour

Before going into Ansible and relational databases, let me point out that Microsoft managed to put the whole Windows codebase into a single Git repository (admittedly after writing a Git-specific layer on top of the file system), so I would consider scalability of text files a solved problem. They might not be the best tool for the job, but that’s a different story.

Ansible and Relational Databases

While there’s no relational database support built into Ansible, you can always extend it in multiple directions. You could use dynamic inventory scripts or Python modules to solve the particular problem of storing Ansible inventory or host variables in a relational database, and there are plenty of them lying around GitHub, for example dynamic Ansible inventory for MySQL.

However, before rushing out and writing a Python module or adapting the above-mentioned dynamic inventory script, you have to ask yourself:

  • What problem am I trying to solve?
  • What is the best tool for the job?
The following section is a very quick summary of the Data Models and Data Stores section from the Building Network Automation Solutions online course. Register here.

Text Files or Relational Databases?

Regardless of what technology you use for the data store, your solution has to provide this functionality:

  • Create/Read/Update/Delete (CRUD) functionality
  • Reasonably easy-to-use user interface that provides search, adds, updates and deletes.
  • Data validation and referential integrity (either built-in or through data validation)
  • Change logging and audits (optional)
  • Transactional consistency (optional)

Text files (using text editor + Git) provide all of the above apart from transactional consistency and data validation. Data validation including referential integrity checks is relatively easy to implement with either commit scripts or CI pipeline.

Relational databases provide CRUD functionality, data validation and referential integrity. If you want to have a user interface, change logging, reviews, or audit, you have to add a front-end application (often custom-written).

If you already have a service provisioning system with relational database back-end and good UI, go for it. If you’d have to write it from scratch, text files might be good enough for the first few days.

Is there a middle ground? Sure. You might want to check out IPAM/CMDB platforms like NSoT and NetBox… and speaking of NetBox, Jeremy Stretch talked about it during his guest speaker presentation in the Building Network Automation Solutions online course.

Latest blog posts in Single Source of Truth (SSoT) in Network Automation series

3 comments:

  1. What do you think about NoSQL databases? Their flexibility might come handy when storing return values of various playbooks.
    Replies
    1. I used a NoSQL database in a niche problem that had relatively loose data structure (so I didn't want to have a table with dozens of columns, most of them having NULL values) and regretted it ever since ;)

      As always, the real questions are "what do you want to get done", "how hard would it be to do with tools I already know" and finally "is it worth investing into yet-another tool". In my case, the answer was NO (and I was stupid enough to learn it the hard way).

      In this particular case, you should start with "why do I want to store return values of various playbooks" and then "what will I do with that data"... and it might turn out that what you need is a time-series database, or ElasticSearch, or maybe just a bunch of text files (in case you need the return values for logging/auditing/troubleshooting reasons).

      Good luck with your project!
      Ivan
  2. We have spoke about this on twitter as well, but I think one of the biggest things to consider in what GIT gives you in terms of integrations with CI/CD, peer review, and allowing data to live in "purgatory." Meaning, your SoT should represent your current state, if you update the state today, but do not plan to push until next change window, now they are out of sync. Your data will always be out of sync then.

    Replicating the same controls in a DB is not easy, and the UI for GIT integrations is not pleasant. As always, it depends :)
Add comment
Sidebar