Ansible versus Puppet in Initial Device Provisioning

One of the attendees of my Building Next-Generation Data Center course asked this interesting question after listening to my description of differences between Chet/Puppet and Ansible:

For Zero-Touch Provisioning to work, an agent gets installed on the box as a boot up process that would contact the master indicating the box is up and install necessary configuration. How does this work with agent-less approach such as Ansible?

Here’s the first glitch: many network devices don’t ship with Puppet or Chef agent; you have to install it during the provisioning process.

Also, you can start using Puppet or Chef only after the box has been received at least some minimum configuration. For example, even if you’d have a Puppet agent on the box, it wouldn’t know what the IP address of Puppet server would be.

For these (and other) reasons most vendors implement Zero-Touch Provisioning (ZTP) along these lines:

  • When a box boots without usable configuration, it sends out DHCP requests.

Whether the DHCP requests are sent only on management interfaces or all interfaces is a minor implementation detail in the scope of this blog post and a very important security consideration in real life.

  • DHCP server replies with IP address and whatever other parameters (standard or vendor-specific) have been configured;
  • The extra parameters passed in DHCP reply could include a URL to download scripts or configurations from, or boot file (initial configuration) to load;

If the initial switch configuration includes Puppet agent, then it would eventually connect to Puppet server and pull down the desired device state.

If you’re using Ansible then the DHCP reply could trigger a script that would run an Ansible playbook that would eventually push desired configuration to the device.

From Theory to Practice

I sent my reply to David Barroso (he has orders of magnitude more real-life experience in this area than I do) and this is what he told me:

The only problem with the puppet agent is that you need to install it and traditional ZTP implementations don't have a good way of installing extra software. However, what most vendors do nowadays is that rather than pushing a configuration file, they give you an IP and a hostname as always, and then they give you a script that will be executed on the machine.
That way you can "curl" your configuration or generate it on the fly with code, install all necessary software, trigger Ansible via some API, check cabling is correct, register to an inventory database... you get the idea, whatever you want to do as part of your provisioning workflow. Both Cumulus and EOS can do that, not sure about others though.

Want to know more?

Finally, why don’t you go on a journey that will help you deploy your first network automation solution?

6 comments:

  1. We are using Ansible for network automation at Hostinger and we are happy with it. We tried to use Chef with Cumulus, but the effect was the same, because Cumulus's Chef version doesn't have support with chef-zero, only chef-solo, which is not compatible with chef-server together. I meant, that Ansible vs. Chef-solo is the same. We solved this running Ansible from Jenkins + Github. We push some changes (new BGP neigbors, change firewall rules, change bridge settings, etc.) in Github and Jenkins just runs Ansible on every change in Github.
  2. The issue of configuration that exists outside of ZTP is very irritating. In Juniper-world, for example, virtual chassis configuration exists outside of the configuration - in Cisco parlance it is an enable command to enable VC, rather than a configuration command. Irritating as it means it is difficult to fully automate the build (although we have worked around it).
  3. It is quite interesting seeing this discussion and knowing how my old self went through all the same when I started automating application server builds a long time ago.

    The real choice here, that you should spend some time thinking about, is whether you want your devices to be "as capable of self-healing as possible" or if you just don't care. No secret that I certainly DO care, I want all my parts of my infrastructure to be autonomous, if at all possible. Autonomy leads to greater certainty, and if there's one attribute you want inherent in your networks, it's certainty. http://shop.oreilly.com/product/0636920036289.do

    Backing up, if you are doing network automation because you need to solve problems you have _right now_, then something like Ansible using the push model is likely your only option. However, if you are designing future infrastructure networks choosing a model that builds on the autonomous actors pattern is a massive benefit. At smaller scales than you'd think, is my experience.

    Puppet masters and chef servers mostly confuse people as it again strongly couples your devices to a central point, which you don't really want. Fix it by 1) Ship (most, if not all) your configuration out to every device and let agents converge devices to their known state 2) Use service discovery mechanisms for any runtime dynamic data

    There, saved you 100s of consulting hours, go make greatness :D
    Replies
    1. Problem with "central points" can be solved by using distributed key/value stores, for e.g. Consul.
    2. Which is OK if you do it correctly, re my 2) above :)
  4. Not *really* ZTP/router configuration, but.. Puppet doesn't need to have an agent installed on the node.

    Here's an example I dealt with just in the last few days that does exactly that - uses Puppet and a plug-in that uses REST to send config commands to apply configuration to a Brocade Virtual Traffic Manager with a bare-minimum configuration applied (has an IP, login/pass and REST enabled):

    https://github.com/dkalintsev/Brocade/tree/master/vADC/CloudFormation/Templates/Variants-and-experimental/Configured-by-Puppet

    In this example, I have a server that's external to the vTM being configured, where I inject parameters into a Puppet manifest, including the target node config (IP, login/pass), and then run "puppet apply " to push it, as opposed to more traditional "pull". There's no puppet installed on the vTM - it simply gets API calls.

    I'm guessing a similar approach should probably work just fine with some router with NETCONF over ssh or something.
Add comment
Sidebar