Breaking APIs or Data Models Is a Cardinal Sin

Tuesday, April 29, 2025 08:15 +0200

Breaking APIs or Data Models Is a Cardinal Sin

Imagine you decide to believe the marketing story of your preferred networking vendor and start using the REST API to configure their devices. That probably involves some investment in automation or orchestration tools, as nobody in their right mind wants to use curl or Postman to configure network devices.

A few months later, after your toolchain has been thoroughly tested, you decide to upgrade the operating system on the network devices, and everything breaks. The root cause: the vendor changed their API or the data model between software releases.

Why Does It Hurt So Much?

There’s nothing more annoying than having to change your working code just because an API (or a data model) your code is using has changed. You waste a lot of time just to keep things running while gaining absolutely no extra functionality. In the worst case (in networking), you are forced to keep working with two (or more) versions of the data model in parallel (and keeping track of which version to use with which device) because you can’t upgrade all devices in one shot.

Most providers of API-based services are well aware of that and do their best to keep their APIs stable¹. Even Google (the notorious deprecator) changed their analytics API only once in more than a decade I was using their product. While doing that, they had about a year-long grace period during which both versions of the API worked.

But Networking Is Different, Right?

As always, some people think that networking has to be different. You might be lucky and work with a vendor that learned the hard way that customers prefer stability, or you might be using a vendor that changes their data model every year, like Nokia does with SR Linux.

You might claim that I’m picking on Nokia, and I might be. While I love their work on containerlab and appreciate that they make SR Linux available in a fast-starting, free-to-download container format, there are certain things no vendor should be doing, and breaking promises they made to their customers (a published API or data model is somewhere between a promise and a contract) is pretty high on my list of things to avoid.

Of the two dozen platforms netlab supports, only Nokia SR Linux requires configuration template changes with every other software release (netlab configuration templates were impacted by software releases 25.3, 24.7, and 23.3). While FRRouting loves changing the data structures of JSON printouts (which totally breaks our validation scripts), I don’t remember ever having to change a configuration template to accommodate a newer software release from a major networking vendor (VyOS is another unfortunate exception). On the other side of the spectrum, Cisco IOS introduced address families in BGP decades ago, but still recognizes the old IPv4-only configuration syntax.

What’s Changing?

One would understand that you have to break some eggs to make a better omelet, but some SR Linux changes we had to deal with are so banal that they’re driving me crazy. For example, in SR Linux release 25.3:

They changed the way you set BGP MED in a routing policy. Previously, the attribute was set with /routing-policy/policy/entry/action/bgp/med/set, now you have to set the /routing-policy/policy/entry/action/bgp/med/operation to set. Why couldn’t they retain both options?
Prefix matching in a routing policy was specified with /routing-policy/policy/entry/match/prefix-set, now it’s /routing-policy/policy/entry/match/prefix/prefix-set. WTA*?
BGP community propagation was specified per BGP neighbor; now it’s set per address family. Many vendors belatedly realized (or not at all) that some BGP attribute handling should be configurable on the AF-level, but most of them retained backward-compatible syntax when they changed their mind. For example, configuring a parameter per neighbor would still be allowed and impact all address families.

I would have hoped the customer’s well-being would be valued higher than a puristic view of the data model, but I’m clearly misguided. On the other hand, even Ansible, with its “we cut it three times and it’s still too short” stunts, mastered the art of deprecating functionality. I really can’t grasp why someone would feel the urge to abruptly break the data model instead of slowly deprecating old attributes.

Then there are more fundamental changes to the data model. What was a single value could become a list. For example, SR Linux 24.7 introduced multiple routing policies per BGP neighbor, changing the import-policy and export-policy parameters from strings to lists. Even that’s trivial to solve – netlab silently converts scalar values to lists because we don’t want to force the customers to enclose stuff in square brackets for no good reason. Typical CLI configuration commands that previously accepted a single value could start accepting a list of values. XML also had no problems with the one-or-many dilemma², which might explain why Junos happily takes a single value or a list of values for many configuration parameters.

However, you can’t see those solutions if you drank too much YANG Kool-Aid. YANG is a strongly typed schema language and does not allow multiple data types for a single parameter. But even the YANG world has a solution: you can specify mutually exclusive parameters with the choice construct. In the above example, we could have import-policy taking a single value and import-policy-list taking a list of values.

But like breaking changes to the configuration data model wouldn’t be enough, there’s more:

SR Linux uses the same data model for CLI configuration, which means that the breaking changes in the API/YANG data model also break the CLI configuration process.
If you save the configuration of a device running SR Linux release 24.10 (using get / from the running datastore), and that configuration uses one of the changed data model parameters, you cannot load it on a device running SR Linux release 25.3. I’ve never seen a network device without that baseline level of backward compatibility³.

Even assuming one cannot avoid breaking changes, there’s a clean solution to all of the above: API or data model versioning. The simplest solution would be to ask the customers to include the target software version in every configuration request and do the necessary translations behind the scenes. Why do you think Cisco IOS configuration starts with the version command?

What Could We Do?

Is there something you can do about vendors breaking APIs and data models (it probably isn’t just Nokia – please leave a comment with your horror story)? Make yourself heard. Maybe you’ll hit a pain point somewhere (it worked for me at least once), or maybe there’s someone within the vendor organization who knows what needs to be done but cannot make the changes without external pressure.

Finally, I always told people yammering about vendor behavior to vote with their wallets, but that doesn’t seem to work in this case. Nokia must have a particularly loyal set of customers⁴ to be able to break customer provisioning tools just to retain a clean configuration data model.

People killing their APIs because they find them inconvenient are an entirely different category. ↩︎
Resulting in hilarious XML-to-JSON translation SNAFUs ↩︎
Trying to downgrade the software after saving the device configuration with the new software release is an entirely different ballgame. ↩︎
Or maybe they’re so big that the decisions are made solely at the golf course/PowerPoint level, and the developer complaints are completely ignored. ↩︎

automation

Latest blog posts in CLI versus API series

Repost: On the Advantages of XML
Response: CLI Is an API
Screen Scraping in 2025
Breaking APIs or Data Models Is a Cardinal Sin (this post)
NetDevOps Automation with REST API
Can We Make REST API Transactional Across Multiple Calls?
Read Network Device Information with REST API and Store It Into a Database
Stop Using GUI to Configure SDN or Intent-Based Products
Stop the Low-Level Configuration Manipulation
Must Watch: History of Cisco IOS CLI

5 comments:

Bob 29 April 2025 10:07

Hi Ivan. I agree with your point on the necessity of backward compatibility, and yes, Nokia (NB: I workly work with their routers) puts that burden on the client. Being able to get-config a full XML configuration from version N and edit-config it to version N+1 would be a bless (and the router is well able to do it during the upgrade process...). You actually can include the YANG version in your request, by embeeding it to the xmlns model of your XML payload (never done that though). I also think that YANG/NETCONF underestimated the importance of dealing with versioning. gRPC/protobufs do it better. Like, including changelogs inside the YANG file in the age of git?? Come on. Even back in 2010 (first YANG RFC, 6020), versioning tools were all over the place (maybe not at the IETF though). PS: here is a freely available web tool developed by Nokia to help you list the YANG changes between versions : https://yang.labctl.net/yang/SROS/. Cheers.

Replies

Ivan Pepelnjak 29 April 2025 10:20

Thanks a million for the feedback!

> the router is well able to do it during the upgrade process

... which makes it even worse. They have the solution but don't feel like they should make it available to external API consumers.

> I also think that YANG/NETCONF underestimated the importance of dealing with versioning.

Yes. However, seasoned vendors like Juniper never experienced problems on such a scale -- they always knew they had to be very careful when making changes to the configuration data model.

Finally, I have to work with SR Linux (not SR OS) through the Nokia SR Linux Ansible collection, which uses JSON-RPC, so there's no way to specify the version in the XML namespace :(

Tony P 29 April 2025 10:52

Good rant and to the point obviously from customer perspective. Noooooow, if on the other hand the customers would not camp for literally tens of years on regex scripts scraping screens lots of stuff could progress much faster as a little input from vendor's side (and yes, I know that XML parsing needs bit learning) ;-)

Johannes 29 April 2025 12:04

This is why I always test the relevant API endpoints and the responses in automated test series (e.g. using pyATS engine).

There are other aspects, except a breaking API change, which could make this necessary (and I ran in all of them):

Is the returned HTTP status code still the expected one (as applications might catch the codes for error handling or as a success indicator). Imagine you expect (in your code) a HTTP code 403 and the vendor changes it to 401 (or whatever).
Is the API authentication still as expected? In my latest ISE testing I discovered, that Basic Auth didn't work for external AD users any more, for one specific API endpoint.
Are the API response times for a specific endpoint still as expected?
...

So regardless if the vendor introduced (undocumented) breaking changes to the API, the API should be tested very intensively before a software update. Especially if the API is used for business critical applications. I heard rumors, that a bug fix release sometimes introduce new bugs :).

Roman Dodin 29 April 2025 03:53

Thanks for the rant ;)

NBC changes suck, believe it or not both for a vendor and a customer, especially a customer. But sometimes they are unavoidable or have a very high cost of avoidance. This does not excuse us from making them frequently, and we understand the cost, the pain, and everything you outlined.

This is why we have a rigid restraint of only introducing NBC changes once a year, when we have our first release in the calendar year. A semver analogy would be that every year you have a bump in the major version where NBC changes can be made and expected.

Having said that, I wanted to point out that SR Linux maintains a config transformation infrastructure that translates the config from vA.B.C to vA+N.x.z when either

a) the node boots up and detects that the config it has on disk is provided with an older version b) the tools system configuration update file <file> command is executed

This "version" is part of the config file in the json format and can be found in ._preamble object.

These transformation scripts take care of all the changes we introduce config-wise, so the previously singular import policies would be translated to a list of one element, and all other changes as well. So, this backward compatibility "at boot" is guaranteed, and users who upgrade between releases won't have issues with it. But as you can see, it is not currently available to the north-bound management interfaces when you send payloads in an older format to the newer one.

Having said that, I am inquiring internally if the transformation infrastructure can be made available to the north-bound interfaces, and if it can we will make sure to provide it. Thanks for flagging this.

TLDR: Contrary to what you have said about the golf courses, I am pretty confident no other big vendor is as receptive to feedback on the management interfaces as we are.

Replies

Ivan Pepelnjak 29 April 2025 05:56

> Thanks for the rant ;)

You're welcome ;))

> Having said that, I am inquiring internally if the transformation infrastructure can be made available to the north-bound interfaces, and if it can we will make sure to provide it. Thanks for flagging this.

Thanks a million. However, as I pointed out in the blog post, all the changes that tripped the netlab development team could be easily avoided with a slightly more gradual approach to data model changes.

Phil Shafer 29 April 2025 08:30

Completely agree with your premise that contracts must be firm to be useful and the affects of breaking one can be expensive. Part of the motivation for automation is to facilitate easy upgrades (by avoiding screen scraping), so changing a contract means hurting exactly the folks we're trying to help.

The one-to-many change has been the most common change we've seen in JUNOS releases, which is why it's a tragedy that we didn't put this into YANG's "Updating a Module" list. Adding this situation to the list would be a backwards-compatible change, so perhaps we can repair this in a future version of YANG.

Thanks, Phil

Add comment