Single-Image Systems or Automated Fabrics

In the Network Automation 101 webinar and Building Network Automation Solutions online course I described one of the biggest challenges the networking engineers are facing today: moving from thinking about boxes and configuring individual devices to thinking about infrastructure and services, and changing data models which result in changed device configurations.

The $1B question is obviously: and how do we get from here to there?

My ideas were heavily influenced by the Our Traditional Interop Breakfast discussions I had with Terry Slattery in times when Interop still found it worthwhile to invest into good speakers, and the fantastic blog post by Jeremy Stretch. Thank you both!

There are two fundamental ways to solve this challenge:

  • Build or buy a network automation solution that translates high-level data model (marketers prefer to use the word Intent) into device configurations that are then pushed to traditional network devices;

  • Buy a $vendor solution that pretends a whole network (for example, a data center fabric) is a single device. This trend started with shared-control-plane approaches like stackable switches, Cisco VSS and HP IRF, and was made way more scalable with reasonable architectures like Juniper QFabric, Cisco ACI, or Pluribus Networks Netvisor.

The major difference between the two: how much you can do when the thing breaks down (here’s what Douglas Adams had to say on this topic).

If you’re using a network automation solution (doesn’t matter whether it’s a bunch of Perl scripts, Ansible playbooks, or Cisco NSO), you can still log into individual devices, check their configuration, figure out how to fix stuff, and get the network patched together till a proper fix is implemented. Obviously, this approach only works if:

  • You thought about big red button that can stop the automation from overwriting your attempts to get the network up and running;

  • You haven’t fired all your network engineers due to misplaced unwavering belief in Easter Bunny and $vendor promises.

If you decided to go with the single-image solution from a $vendor you’re stuck with a broken network till they get their act together and figure out how to fix it, because in most cases you can’t troubleshoot what’s going on behind the scenes, and have no way of fixing it even if you’d be able to figure out what’s wrong.

At this moment pundits with more opinions than operational experience should start telling me how totally outdated I am and how every industry eventually moves from tinkerers to integrated solutions, citing modern cars as the prime example.

I totally agree with that sentiment, and would be perfectly fine with that approach… as soon as those same pundits could tell me where I can get a rental network when mine sits at a clueless mechanic for a month because the $vendor-approved testing equipment cannot identify what’s wrong with my network. No takers? I thought so.

Of course, you can make a counter-argument that the almighty $Cloud provides all the answers, and that might be a good-enough approach for many smaller organizations, but do keep in mind that you still need a network to get the data in and out of the cloud.

Everyone else has to decide whether they want to be able to mend the network should $vendor introduce a feature that doesn’t work as advertised, or drink awful free coffee and wait patiently whilst the business is on fire. Just remember that there is no silver bullet, no unicorn tears, and rarely some silver lining.

Disagree? Please let me know!

8 comments:

  1. In our company we need more than just some data models on a linux host which then get's translated by jinja2 to some config files and finally get's pushed down to the networking devices. We already have a version control system in place. What we need is some orchestration tool which integrates our business logic. How do you deal with that? Please don't answer with "find out in one of my courses".
    Replies
    1. Based on experience with both types of deployment, the answer as usual, as Ivan also always says, is that "it depends". If you need orchestration, and don't want to do it yourself, and are willing to pay a premium for it, ACI or NSX or one of the other similar options may be the best route for you. If you want control, then you can deploy your own EVPN fabric. There is no right answer. For some of the customers I support, simply due to the requirement of "one neck to choke", only realistic options are the ones like ACI or NSX. In others, there are options to deploy EVPN using Ansible/Salt or even "manually" (i.e. classic CLI configuration) - which is required as those environments require more flexibility, interoperability, and less vendor lock-in. Also, as Ivan also says, some lock-in is inevitable - everything really just depends on your requirements, budget, and resources.
    2. It's mostly what DixieWrecked said - either you buy a black box (QFabric, ACI, NSX, ...), or you buy or replicate something like Cisco NSO.

      In any case, you need user interface that does transactions on services data model, which combined with infrastructure data model gets translated into device data model(s) which then get translated into device configurations.

      As for tools: some people build their front-end stuff, or use Ansible Tower (or AWX) as an approximation, others use all sorts of orchestration tools from vRealize to HP orchestration tool to Cisco NSO (and there are, I'm guessing, I few using Apstra).

      The back-end depends on what you need to get done. Ansible, Salt, running Ansible playbooks from vRealize... tons of options.

      And yes, a lot of that is in my courses.
  2. There are two fundamental ways to solve this challenge:

    "if multivendor:"
    Build or buy a network automation solution that translates high-level data model (marketers prefer to use the word Intent) into device configurations that are then pushed to traditional network devices;

    "else:"
    Buy a $vendor solution that pretends a whole network (for example, a data center fabric) is a single device. This trend started with shared-control-plane approaches like stackable switches, Cisco VSS and HP IRF, and was made way more scalable with reasonable architectures like Juniper QFabric, Cisco ACI, or Pluribus Networks Netvisor.
    Replies
    1. Ouch, haven't realized you effectively quoted my article. Nice touch, made my day...
    2. Option #1 isn't really multivendor since the single automation vendor becomes critical.
    3. And of course you're absolutely right - we're entering the recursive world of lock-in (see http://blog.ipspace.net/2015/01/lock-in-is-inevitable-get-used-to-it.html)
  3. The answer to that in the Internet was of course the middle ground. A bunch of open standards gets published and the market bears multiple vendors that provide the pieces that stick together along standardized interfaces. Advantages were so overwhelming that it has proven the prevalent model of building networking for a long time now. Multiple vendors implementations shook IME bugs out much better (beside doing the hard work of scaling and supporting) than a free, best-effort, a.k.a. "open source reference implementation" and the versioning/work of standards prevents the "big, ugly forklift" upgrades and "any color as long it's black" offerings in the market. Just my 2c ...
Add comment
Sidebar