Data Deduplication in Network Automation Data Models
One of the toughest challenges in the hands-on part of Building Network Automation Solutions online course is the create a data model describing your service exercise.
Networking engineers never had to think about data models describing their networks or services, and the first attempt often results in something that looks like simplified device configuration in YAML or JSON format.
I wrote a long article describing how you can slowly redesign your box-focused data model into a network-focused one. The first parts describing the problem and initial deduplication are already online.
Your examples still contain duplicate date e.g. neighbor name and interface. What if the hostname or interface change? You have the data in two places. You've traded explicit complexity with implicit complexity.
In the end, you have to agree what your unique (node or interface) identifier is going to be - what database people would call "primary key". It can be a UUID, an auto-increasing integer, or something that makes sense to you (in my case: hostname).
Next problem: what happens when the primary key value changes and someone uses it in another table? Database people would call this one "referential integrity". Relational databases solve the problem automatically (assuming you declared foreign key in the second table).
You could also decide that you don't want to deal with referential integrity in which case the only chance of remaining consistent is to make sure primary keys never change.
In the end, you have to decide where your balance between convenience and consistency is, and I would go for unique identifiers when working with databases fronted by business logic (API, GUI, CLI... or all of the above) and hostnames when working with text files.
Obviously you'd have to make sure your data model has referential integrity even when you work with text files - and that's why I'm always telling engineers attending my Network Automation course that they should always validate input data.
Hope this helps,
Ivan