Data Deduplication in Network Automation Data Models

Tuesday, May 21, 2019 08:17 +0200

Data Deduplication in Network Automation Data Models

One of the toughest challenges in the hands-on part of Building Network Automation Solutions online course is the create a data model describing your service exercise.

Networking engineers never had to think about data models describing their networks or services, and the first attempt often results in something that looks like simplified device configuration in YAML or JSON format.

I wrote a long article describing how you can slowly redesign your box-focused data model into a network-focused one. The first parts describing the problem and initial deduplication are already online.

Latest blog posts in Single Source of Truth (SSoT) in Network Automation series

Recent posts in the same categories

automation

data models

3 comments:

Anonymous 21 May 2019 21:17

Great article! Thank you for that. Looking for more to come. I find that the data model is the most crucial part in automation. Data deduplication and making the model flexible for future extension is hard.
Your examples still contain duplicate date e.g. neighbor name and interface. What if the hostname or interface change? You have the data in two places. You've traded explicit complexity with implicit complexity.

Replies

Ivan Pepelnjak 22 May 2019 11:35

Great comment! Have to add a discussion along these lines to the article. Here's the short version:

In the end, you have to agree what your unique (node or interface) identifier is going to be - what database people would call "primary key". It can be a UUID, an auto-increasing integer, or something that makes sense to you (in my case: hostname).

Next problem: what happens when the primary key value changes and someone uses it in another table? Database people would call this one "referential integrity". Relational databases solve the problem automatically (assuming you declared foreign key in the second table).

You could also decide that you don't want to deal with referential integrity in which case the only chance of remaining consistent is to make sure primary keys never change.

In the end, you have to decide where your balance between convenience and consistency is, and I would go for unique identifiers when working with databases fronted by business logic (API, GUI, CLI... or all of the above) and hostnames when working with text files.

Obviously you'd have to make sure your data model has referential integrity even when you work with text files - and that's why I'm always telling engineers attending my Network Automation course that they should always validate input data.

Hope this helps,
Ivan

Anonymous 22 May 2019 18:00

Perfect! Referential integrity is really the key here (in the truest sense of the word). Yes one has to made a trade off between convenience and consistency. Validating input data is of course very important but may not be sufficient to get referential integrity with text files. So only option left is as you stated a relational database with UUID and foreign key.

Add comment