Why It's Hard to Deploy SDN-Like Functionality Today

Whenever I talk about the various definitions of SDN (ending with the “SDN provides an abstraction layer”), old-timers sitting quickly realize that the SDN products that you can deploy in real life aren’t that different from what we did in the past – an SDN controller is often just an overhyped glorified network services orchestration system.

OK, so why didn’t we have that same functionality for the last 20 years?

Diverting into Anecdata

Some large networks were fully provisioned by orchestration systems for years – be it with home-grown tools, or ridiculously expensive multi-vendor solutions (Cisco alone had at least two or three products claiming to do network services provisioning in the past).

A while ago I was talking with someone who actually used one of those multi-vendor tools, and he had two major complaints: (A) any change was ridiculously expensive because (B) the solution provider wanted to do full-blown regression tests on every single hardware variant and software version they planned to deploy in the production network.

One has to wonder what led someone to such a level of paranoia.

The Clumsy Configuration Interface

SDN evangelists are quick to point out how CLI is the root of all evil (guess what: it’s not), but there is something fundamentally wrong with most CLIs used on networking gear today: they were never designed with scripting, automation or machine-to-machine communication in mind. They were always targeting a lone network operator furiously typing on the keyboard; even cut-and-paste didn’t always work, as some devices didn’t do proper input buffering.

While it might be possible to fix the configuration part of the CLI mess (Cisco used to have CLI police that would stop the programmers from messing up too badly, but they must had disbanded it by the time MQC was coded), the instrumentation/monitoring part might be beyond hopeless.

Some vendors (example: Juniper) did the right thing. Every printout starts as a data structure which you can get in XML format or rendered through the printout template.

The XML format is extremely easy to parse in network automation environment (XML libraries are available for every operating system and major programming language) and quite resilient to adds and changes – as long as the old tag and attribute names don’t change, the code processing the XML output doesn’t have to care about new fields, tags, or reordered data.

It seems that most other vendors still believe in producing printouts with sprintf calls liberally sprinkled throughout the code. Cisco is definitely one of them (but do keep in mind that they have a 30-year-old code base), and Arista might be another one – I can see no other reason why they’d be so slow in rolling out the eAPI support for individual show commands.

The printouts produced by these vendors also tend to change across software releases (and even maintenance releases). After all, the network operator (or the TAC engineer) doesn’t care if the printout is slightly different from the one produced the previous day. Automation scripts do.

According to some senior instructors I chatted with, students in certain geographies experience the same behavior – if the printout seen on the console doesn’t match the one in the student guide character-by-character there must be something wrong.

Processing output produced by such code by an automation script always involves some amount of screen scraping (PERL regexp anyone?), and heavy regression testing to ensure a software upgrade doesn’t break the screen scraping functionality, resulting in very high costs of multi-vendor orchestration systems.

Don’t even try to tell me NETCONF is the answer. A show command executed through NETCONF on Cisco IOS returns the traditional printout in XML envelope, and multi-vendor data models are still a bit of a pipe dream.

Unfortunately, I don’t expect this sad state of affairs to improve until the old software dies (probably after I’ll be old enough to retire) – it would be pretty hard to persuade anyone to rip out all the legacy stuff in millions of lines of code. Or (if you’re brave enough) you might go for vendor-supported screen scraping (aka Cisco EDI).

Latest blog posts in CLI versus API series

6 comments:

  1. Valid points. But unfortunately SDN apart from providing a programmatic interface is in same mess. Controller REST APIs vary wildly . Even Openflow switches vary in what they support.
  2. This comment has been removed by the author.
  3. So industry-standards in that area would be RFC6241 for deployment engine and RFC6020 for modeling.
    Another (big possibly) attempt in that area is gRPC support on network devices...

    Network community is going to the same evolution that happened to System Engineers 5-6 years ago with configuration management FOSS tools getting popular. In that world engineer no longer sees single box as something he needs to configure, but works manifests/playbooks/cookbooks -- text files, describing desired state and variables of abstract service.

    Yet, even assuming all that is given, how many network engineers are ready to manage network via definitions files, code reviews and automated deployment?
  4. I've seen hundreds of projects where router configuration files were listed in terms of +/- changes to be done during the window. Network Engineers are ready, just need to migrate towards a new methodology mindset.
  5. or, you can take part in a community-developed network devices syntax parser like this one https://github.com/networktocode/ntc-ansible
    which largely is still screen scraping but a lot more manageable and robust
  6. Hah, screen scraping is the worst. Unfortunately, like everyone said, it will be a long time before we can rid all networks of these devices and they will haunt and paralyze the network for years to come. Yes, REST/SOAP APIs will change and vary wildly but they are orders of magnitude more sustainable than parsing / regexing / expecting through console output for every vendors whim. Until then, the network will be the brunt of developers and CIOs. If you have to run a network/infrastructure and you can standardize on one vendor/API for the entire network, you're golden. However, modern stacks are usually too varied and complex to lend themselves to that. Like everyone has said, waiting for the network world to standardize on models/APIs is always painful. Just as always, those who can execute and be on top of maintaining drivers for each of their select vendors wins (think SOA and swapping out drivers). Software engineers are used to ever changing APIs and obviously not as paralyzed as the network world. Your cloud service providers have simply excelled in stringing together many of the same pieces and wrapping it up in a bow. Many vendors have had SOAP/REST APIs for automation for a long time so the most interesting forces of SDN have been introducing software discipline/benefits like versioned lifecycle (infrastructure as code) and self-service. If we are to then treat networks/infrastructure like a software system, it should be well architected, enable self-service, resilient and autonomous (i.e. closed loop,anomaly/exception handling and not needing a lot of interaction/orchestration). Heck, even GIT + REST + well architected systems (works for software) could go a long way for heterogenous networks/infrastructure in the mean time. As stacks collapse and tenancy also involves network/infrastructure changes (ex. new client = new VPC/VXLAN/etc), there will be less static infrastructure to maintain anyway and network/infrastructure vendor CLIs will most likely be a distant memory. Then controllers in charge of the full tenant topologies will eventually be judged on how many API drivers they integrate with and support (to maintain not just the software defined tenant/application topologies but the supporting static infrastructure).

Add comment
Sidebar