… updated on Wednesday, March 17, 2021 06:30 UTC
Using YAML Instead of Excel in Network Automation Solutions
One of the attendees of our network automation course asked a question along these lines:
In a previous Ansible-based project I used Excel sheet to contain all relevant customer data. I converted this spreadsheet using python (xls_to_fact) and pushed the configurations to network devices accordingly. I know some people use YAML to define the variables in Git. What would be the advantages of doing that over Excel/xsl_to_fact?
Whenever you’re choosing a data store for your network automation solution you have to consider a number of aspects including:
- What will the user interface be?
- Do I need transactional consistency?
- Do I need to track changes to the data?
- How will I validate changes?
- Do I need to know who made the changes and when?
Considering just these five aspects (there are probably many more), what are the options?
- Excel provides a nice user interface, but nothing else.
- YAML files combined with Git provide change tracking and logging, but UI sucks (cue vi versus emacs saga). Data validation usually happens in Git/CI/CD pipelines, starting from Git pre-commit hooks. Data is eventually-consistent; merge conflicts have to be resolved manually.
- Relational databases provide transactional consistency but little else. Most notably, you have to create your own UI. They do provide baseline data checking (is a value in a field expecting a number really a number) and referential integrity, but you might still need further checks implemented in the UI component (example: is IP address valid) or deployment pipeline (example: is interface name valid).
- IPAM/CMDB tools provide nice UI, data validation, and transactional consistency (because they’re running on top of a relational database), but usually no change tracking or logging.
Assuming you want to keep things simple, I’d prefer YAML over Excel:
- Excel has simple row/column data structure. In YAML you could have richer (structured) data types.
- Excel tries to be too helpful, and will quietly change an IP address into a number assuming you made a mistake and used too many dots (source: Ioannis Theodoridis, see below, another example in LinkedIn comments)
- You could track changes to YAML files with Git. Doing that in Excel is… interesting.
- Don’t get me started on the beauties of having a shared Excel spreadsheet in a Dropbox folder.
- Using YAML files (or Excel spreadsheets) will get you started, but you’ll need something better in the long run.
- Some people use Excel as a query tool. No problem with that – after deploying the services (based on YAML files), collect the new configuration/operational state and present it in a read-only Excel spreadsheet.
- For a hands-on perspective on this dilemma, listen to the podcast we did with Carl Buchmann more than two years ago (it’s sad how I keep getting the same questions…).
- I wrote about transactional and eventual consistency in State Consistency in Distributed SDN Controller Clusters
- We discussed data stores in Data Models module of Building Network Automation Solutions online course.
- We’ll also talk about data stores in one of the future sessions of Network Automation Concepts webinar.
You might also want to read:
- Text Files or Relational Database?
- Growing Beyond Ansible host_vars and group_vars
- Validating Data in GitOps-Based Automation
- Automation Solution: Testing Data Models
- Validate Ansible YAML Data with JSON Schema
- Added a pointer to a Software Gone Wild episode with Carl Buchmann
To that end, our approach has been to implement a 2-step process: first step is DB/Excel-to-YAML/JSON, second step is YAML/JSON-to-devices. This way, we're able to enjoy advantages from both approaches albeit the pain of writing the "glue" between states.
That said, it seems that Dolt (especially as a backend for ~~NetBox~~ Nautobot) is a very interesting approach to "marry" the two in a more structured way...
@Nikos: Thanks for the comment - looks like you went down this path: https://www.ipspace.net/kb/DataModels/65-Data-Transformation.html
"This way, we're able to enjoy advantages from both approaches albeit the pain of writing the "glue" between states." << This is sometimes called "squashed sausage effect". You can't escape the complexity, you can only try to contain it, and sometimes you can decide where you want to push it to.
As for Dolt: it's a great idea, but to use it with an IPAM/CMDB system you'd have to make that system Dolt/branch aware. Maybe N2C will get there with Nautobot.
Our plan is very much to integrate Dolt as an option for the backend for Nautobot to cover the use case described by @Nikos. We have been talking to our customers for a few years to this problem, and over the past 6 months, several have come to us looking for solutions, so it looks like many more are running into these issues.
I am more or less covered by the other comments but I want to add one thing. Excel adds one more layer between the information and the data that is stored, with auto-formating. Example: You want to store an ip address so you type it and press enter. Excel thinks it's a number that you erroneously typed as text, since it's digits.3digits.3digits.3digits . So it removes the dots from the actual data and stores the info without them. This is done in order to help you but Excel doesn't tell you because it doesn't want to hurt your feelings.. Next thing you know, you write a script to automate data handling and in particular those ip addresses but the script crashes because, what do you know, this isn't an ip address! True story. Stay away from Excel. Yaml is fine.
We managed to implement git-like version control for our network/infra data set using a custom Django/PostgreSQL SoT that we are building in house (migrating from multiple XLSX data sources). Our data set is large enough that it would be unmanageable in YAML files.
When we started to look at doing this, we didn't see anything available that could fit our use case so we went down the custom route. I'm not necessarily recommending this :) It has caused us some pain but it is possible to implement. Luckily we had budget and management patience to be able to take this route.
I'll be following Nautobot/Dolt with interest as integrating Dolt with Django would potentially solve some of our issues and give us a cleaner solution.
@Ioannis: even though I can totally understand what you're saying about Excel in general, keep in mind that the specific issue with dots in IP addresses is mostly a Greek problem. In other locales where dots are used for decimals (and commas for separating powers of 1000), this issue does not exist :-)
And in any case, since you're already in Excel, why not employ a bit of VBA to fix everything? :-P
@nikos: good point about the dots being a greek issue. But it's a trust thing for me. I don't trust it anymore because it cheated on me. Jokes aside, I do get that people are more attracted to excel than yaml. But that's the main issue, if I have to automate something I prefer stability. So maybe I would go with the sausage too, if I didn't have a choice on it.
@Dan: You don't have to wait for Nautobot/Dolt integration to simplify your challenges. All you need is to validate whether Dolt meets your requirements (including maturity and stability) and go from there -- in your case it would be a Postgres replacement.
@Ioannis: I can confirm that it must be a Greek issue. Nothing similar happens when using Slovenian (decimal comma) or English/US (decimal dot) locale.
@Ivan: True, but it's more due to priorities for us. We have the high level part working in terms of modelling and version control, so now we're switching development focus more towards deployment and low level/device config part. We only have a small team. Hopefully by the time we come back to look at improving the version control part, Dolt will have matured and NTC will have done some of the integration leg work already ;)