Fast Arista cEOS Container Configuration

After the enormous speedup I achieved with the FRR containers, I tried to do something similar with the Arista cEOS ones. After all, Arista’s pretty open about running its software on standard Linux, so it should be possible to map host-side configuration files into container-side scripts and execute them, right?

There was just one tiny gotcha: all netlab-generated EOS configuration files are device configuration snippets that are intended to be submitted via EOS CLI, and I didn’t feel like cracking open the netmiko documentation (that’s another backburner project).

However, Arista cEOS includes this magic command called FastCli ;)

The FastCli command can execute any Arista EOS command from bash. It can also accept a filename as an argument and pretend there’s a user typing commands in that file1. That looked like a perfect solution:

  • Take any Arista EOS configuration snippet
  • Prepend it with a shebang (#!/usr/bin/FastCli)
  • Profit

Alas, that failed miserably. Remember the “pretend there’s a user typing commands” part? It turns out you have to start from scratch and enter the configure mode first. The wrapper I needed to make this idea work2 turned out to be:

#!/usr/bin/FastCli
configure terminal
{{ netlab_config_text }}
end

After that initial hurdle, it worked. Sort of. Sometimes. The first glitch I uncovered when running the integration tests was the incredible sloppiness Ansible lets through. For example, I had a line in a Jinja2 template that had an extra closing curly bracket:

{% if something %}}

That line produced a curly bracket on a new line. While that made FastCli totally bonkers (no surprise there), it worked with the Ansible eos.eos_config module. I observed the same behavior with incorrect comments (starting with “#” instead of “!”) and unrecognized commands, such as ip virtual-router mac-address mlag-peer.

FWIW, that command generates the following error message which is not recognized as an error by the Arista EOS Ansible module3:

% Unavailable command (not supported on this hardware platform)

Cleaning up the configuration templates removed all such quirks4, but another mystery remained: the configuration process sometimes failed when executing perfectly valid commands.

That mystery turned out to be another timing issue. Try to configure Arista cEOS too soon after the container has started, and you’ll experience bizarre errors. Waiting for the SSH server to become available5 solved that.

The solution is part of the netlab release 26.02, but has to be enabled with the netlab_config_mode device group variable or node parameter set to sh, for example:

netlab defaults devices.eos.clab.group_vars.netlab_config_mode=sh

Believing in the eat your own dogfood BS, I enabled it a few weeks ago, and happily used it ever after. I hope it will work equally well for you.

Was It Worth It?

TL&DR: You bet!

The best I could do on my server was a 20-node Arista cEOS fabric6 (four spines, 16 leaves) running OSPF, BGP, and EVPN with VLAN/VXLAN configured on the leaves. This is the gist of the lab topology:

module: [ vlan, vxlan, ospf, bgp, evpn ]
bgp.as: 65000

vlans:
  V1:

fabric:
  spines: 4
  leafs: 16
  leaf:
    vlans:
      V1:
  spine:
    bgp.rr: True

The lab start times7 were pretty long8:

Step Elapsed time  CPU time
netlab create 2 seconds 2 seconds
containerlab deploy9 ~2 minutes 1 second

Now for the fun part: configuring the lab with Linux scripts or Ansible:

Step Elapsed time  CPU time
Linux scripts (netlab release 26.02) 11 seconds 3 seconds
Ansible playbook (netlab release 25.12) 1 minute 40 seconds 2 minutes 30 seconds

The comparison is light-years away from being fair. While the Linux scripts hammer Arista EOS with configuration commands, the Ansible eos.eos_config module executes show running followed by a diff and minimal configuration every step of the way (normalize, initial config, VLAN, OSPF, BGP, VXLAN, EVPN).

Nonetheless, it’s nice to see how much time you can save when using the best tool for the job ;)


  1. I never checked whether it creates another PTY to do so. ↩︎

  2. Along with setting the eXecutable bit on the resulting configuration snippet ↩︎

  3. I’m always wondering why they don’t (also) check for the percent sign as the first character in the response, but that’s a campfire story for another day. ↩︎

  4. I’m ever so grateful I invested so much time into creating integration tests ↩︎

  5. Isn’t it weird that we have to wait for the SSH server when we want to use Linux scripts to configure a device? Oh, the wonderful world of networking devices 🤷‍♂️ ↩︎

  6. That fabric already resulted in a peak 5-second load average above 130 when the containers were starting. Not exactly a comfy place to be. ↩︎

  7. See a previous blog post for a detailed description of what individual commands do. ↩︎

  8. The measured times are not statistically significant. In less-baroque language: I only ran the tests once. The first digit and the order of magnitude are probably not too far off. ↩︎

  9. Part of netlab up –no-config ↩︎

Add comment
Sidebar