Automating 802.1x (Part One)

This is a guest blog post by Albert Siersema, senior network and cloud engineer at Mediacaster.nl. He’s always busy broadening his horizons and helping his customers in (re)designing and automating their infrastructure deployment and management.


We’d like to be able to automate our network deployment and management from a single source of truth, but before we get there from a running (enterprise, campus!) network, we’ll have to take some small steps first.

These posts are not focused on 802.1x, but it serves as a nice use case in which I’ll show you how automation can save time and bring some consistency and uniformity to the network (device) configuration. Having consistent device configurations might not be the sexiest side of automation, but it gets the job done and prepares your environment for more automation coolness later. Also, let’s face it: if you need to reconfigure hundreds of switches and tens of thousands of interfaces, automation literally saves (the) day(s).

Implementing 802.1x (after talking to all parties involved) enables us to convert a switch, location and interface specific configuration into a more generic configuration where specific items get pushed to the switch from a central RADIUS server (e.g. Cisco ISE), which also serves as a single source of truth describing user-specific information.

You can find the Ansible playbooks I’m describing in this blog post on gitlab. The 802.1x playbooks where used with Catalyst switches running Cisco IOS, but shouldn’t be too complicated to adapt to other vendors/models.

The playbooks follow a similar setup:

  • Gather information (from the source of truth or lacking that, the switch).
  • Apply changes.
  • Validate changes: does the current active state match the desired state (or intent if you like).

First off: configure VLAN group names

Unless you’re lucky and every switch is already a small L2 domain with L3 boundaries and the same VLAN numbers are reused everywhere, you’re going to need named VLAN groups that a RADIUS server can use to push information to the switch. After receiving information from the RADIUS server, the switch in turn decides which VLAN numbers to apply the information to based on the VLAN group name it received.

Chances are these VLAN groups are not yet configured on all your switches, so we’ll automate configuring the required VLAN groups everywhere as our first automation step.

Software running on most campus switches is still struggling with the idea of exchanging well-formed structured data, so a large part of any network automation solution still centers around building the right sequences of CLI commands and parsing command outputs pretty-printed for humans.

Using Ansible to create and verify VLAN groups is no exception. We have to:

  • Gather information: pull a list of VLANs from the switch by parsing the output of the show vlan command (lacking a single source of truth, we have to depend on the configurations of individual switches) to get VLAN IDs and names.
  • Build configuration commands and apply changes: build VLAN group configuration commands to apply the changes.
  • Validate: parse the output of the show vlan group command to see if our changes have been applied the way we intended.

Some points of possible interest:

  • I use a which_hosts variable with a sensible default in the hosts parameter of Ansible plays. This trick enables me to specify a different Ansible inventory group from the command line using for example -e "which_hosts='other_group'"
  • Playbook arguments that can be specified as external variable (-e CLI parameter) are defined and documented in a vars file with the same filename as the playbook.
  • parse_cli() is used to hammer the show command output into structured data. It’s an extremely handy filter, and if you haven’t used it yet, be sure to give it a try.
  • The Cisco CLI command syntax isn’t always consistent. There are no add or remove commands for VLAN groups like there are for 802.1q trunk VLAN lists, so we have to remove all VLANs from a previously-defined VLAN group before we can start modifying it. To make matters worse, not all Cisco switches accept the 1-4095 VLAN range, so we need to try two versions of the no command.

Mind you, as a consequence of classic Cisco IOS not supporting transactions, we’re facing a potential race condition: if an interface is to be configured with the VLAN ID from a VLAN group at the same time as we’re in the middle of removing and re-adding a (previously present) VLAN group, the VLAN assignment for that interface might fail. A possible workaround is to check which switches are missing the VLAN group we’re configuring (use --check with this very same playbook), then only apply the changes to these switches.

  • I had to build a list of individual VLAN group commands listing every individual VLAN IDs as it seems that using the Cisco number range syntax in Ansible confuses the Cisco IOS versions I used the playbook with.

I’ve not yet discovered exactly why, but it seems that a comma in the range command causes the configuration command sent by Ansible ios_config module to fail.

  • To validate the show vlan group output, we need to convert the list of VLAN IDs back into a number range. I failed to find a suitable filter, wrote my own custom Jinja2 filter, and then (when looking for something unrelated) discovered a VLAN range filter called vlan_compress in the sources of Ansible network engine. Although the network engine github page states they like you to use it as a foundation for building your own Ansible role, it’s not a problem to specify the core network engine as a role to be able to use vlan_compress()

Using the playbook

The playbook expects these parameters that can be specified either in a group variable file or with the -e CLI parameter:

  • search_vlan_pattern : a regular expression specifying which VLANs we want to put have in the VLAN group
  • vlan_group_name: name of a VLAN group we’re creating. The VLAN group will contain a list of IDs for all configured VLANs matching the search_vlan_pattern regular expression.

Example: assuming our VLAN names are case-insensitive names optionally starting with one or two letters, followed by one or more digits, followed by zero or more letters, a dash, the string ‘voip’, again a dash and one or more digits (like b22a-voip-555 or 22-VoIP-555 or b22-VOIP-555) we could use the regular expression (?i)^([a-z]{1,2})?\d+\w*-voip-\d+

Compliance check or dry-run to discover which switches need changes: run the playbook in check mode, for example:

ansible-playbook vlan_group.yaml --check -v \
-e "search_vlan_pattern='(?i)^([a-z]{1,2})?\d+\w*-voip-\d+'" \
-e "vlan_group_name='DATA'"

Make configuration changes. You might decide to apply changes to a subset of switches. To do that, you could filter out the switches that need changes, create a new group in your Ansible inventory, and apply changes to switches in that particular group by setting the which_hosts extra variable:

ansible-playbook vlan_group.yaml -e "which_hosts='change_these'" \
-e "search_vlan_pattern='(?i)^([a-z]{1,2})?\d+\w*-voip-\d+'" \
-e "vlan_group_name='DATA'"
If you have to fix just a few switches, it’s easier to use the --limit CLI parameter.

Configuring a VLAN group with fixed VLAN ID: If you don’t want to extract VLAN IDs from existing VLAN names but want to configure a VLAN group with a fixed VLAN ID everywhere, and if you don’t care whether the switches already have ports in target VLAN, use force_vlan_id extra variable to specify the target VLAN ID, and set only_vlans_with_configured_ports variable to false.

ansible-playbook vlan_group.yaml \
-e "force_vlan_id=705" \
-e "vlan_group_name='VG_705'" \
-e "only_vlans_with_configured_ports=false"

Bonus trick

You might like the bash script filter_playbook_output.bash in the gitlab repo. It cleans up the output of the Ansible playbook.

The script assumes you’re using the YAML stdout callback. You can specify the stdout callback in your ansible.cfg file:

[defaults]
stdout_callback = yaml
bin_ansible_callbacks = True

Alternatively, you could use Ansible environment variables:

export ANSIBLE_STDOUT_CALLBACK=yaml
export ANSIBLE_BIN_ANSIBLE_CALLBACKS=1

If you store the printout of your Ansible playbook into a file using either tee stdout redirection or ANSIBLE_LOG_PATH variable you can use this script to show the relevant portions of the output:

Finishing notes

parse_cli() might miss some information because we’re forced to screen scrape command output formatted for human consumption.

The default line width on most switches is 80 columns and while Ansible tries to set the width to 512 columns (assuming the Cisco IOS version running on the switch supports that) parse_cli() might still miss items if the line length exceeds 512 characters which could easily happen if you have more than approximately 60 interfaces in one VLAN. I’ve submitted a fix which the Ansible team merged into upcoming Ansible 2.8 release.

Git repo

thefriendlynet/ansible_8021x

More Information

Want to implement something similar in your environment? You’ll learn all you need to know about Ansible in Ansible for Networking Engineers webinar or online course, and get all the knowledge and skills you need to build your own network automation solution in our Building Network Automation Solutions online course.

Add comment
Sidebar