Automate Remote Site Hardware Refresh Process
Every time we finish the Building Network Automation Solutions online course I ask the attendees to share their success stories with me. Stan Strijakov was quick to reply:
I have yet to complete the rest of the course and assignments, but the whole package was a tremendous help for me to get our Ansible running. We now deploy whole WAN sites within an hour.
Of course I wanted to know more and he sent me a detailed description of what they’re doing:
We will have around 900 school sites to refresh and rebuild LAN switching and WiFi infrastructure. Another 1000 will come if funding is granted. Each school site could serve between a handful and 2000 students. School could be operating VLANs through an earlier project or could be just one flat VLAN.
The whole process takes in a mix of business and technical information (some of which is collected using Ansible) to generate what we believe will be a design of a school site. Design includes a set of school properties (sitename, sitecode, size category, site LAN IP address summary, VoIP subnet, p2p subnet to the router, some flags that describe the state of adjacent IT projects for a site) as well as what we believe a topology of the site LAN will be including switch models, switch names, inter-switch port numbers. This design is in fact represented in Excel file and once finished, exported from Excel to a site definition YAML file.
We built a web interface to a set of playbooks (our version of Tower), where user is led thru the steps of the conversion process. The process starts by loading a site definition YAML file into the system; that file is then used by each step to get initial seed of information in each playbook.
Shortly before the conversion, data for site devices with static IP addresses and DHCP reservation is collected (using one of the preexisting scripts)
On the migration day access switches are replaced with blank switches or, if existing switches to be retained, reset to blank configuration. The core switch is replaced last. The switches are installed and interconnected as close as possible to the design. But we learnt that in majority of the sites, switch models get mixed around and interconnect ports get mixed around. To make things more interesting, switch configuration relates heavily and has direct references to the switch hardware model.
Site WAN router is the only device that retains its configuration with some changes in the process.
Pre-configuration step generates configurations for the site router, core switch and Aruba IAP cluster. The playbook takes the site design data, calculates all the VLAN IDs, IP subnets, IP addresses, site-specific DHCP/NTP server addresses…
Router configuration step: Ansible uploads configuration from the previous step to the router. The configuration includes a temporary DHCP scope. All site switches as they come up acquire IP address from the site DHCP server or from the Router if site DHCP server becomes unavailable (as the core switch gets blanked and DHCP server is on a VLAN that hasn’t been provisioned yet).
Core switch setup: Ansible finds Core (Campus Distributor, or CD switch) by accessing all switches and checking which switch has the site WAN router as a CDP neighbor. Ansible checks SW version and upgrades CD switch if required, then transfers the configuration file from Step 1 to the CD switch and restarts it. Ansible again updates the router configuration to remove DHCP scope and provision a P2P link with the core switch.
Building distribution topology discovery: Ansible, starting from the already built CD switch, connects to each access (Building Distribution, or BD) switch, tier by tier, to get its CDP neighbor information and save in a YAML file.
Topology check: PHP script draws the topology information on the screen. At this step, the engineer executing conversion looks at the topology, makes a face palm, and tries to finalize the topology by assigning switch names to the switches which may differ by the model or by the inter-connect port references from the design. The result is saved in the YAML file. This same process also produces DHCP reservation PowerShell script file and registers all the switch IP addresses in DNS.
DHCP reservation script file is used (a manual step at this stage, windows team does not trust Ansible yet) to create DHCP reservations for all the site switches
Switch software upgrade: Ansible scans the site for all switches, loads them into device grops based on their CDP tier (as per CDP tier topology learned in previous steps), then upgrades the software on BD switches, tier by tier in the edge to core sequence, and then, in the core to edge sequence, on the fly generates configuration files and transfers them to each BD switch and restarts it.
Final checks: Ansible discovers the resulting BD topology. The resulting topology is reviewed to verify that all switches had there config applied and come up with new IP addresses. Some of the previous steps can be re-run if any switches failed to get initial configuration.
Security setup: Ansible executes post-configuration script where it finds all the MAC addresses which used to have static or reserved DHCP address and applies DHCP/ARP trust setting and/or VLAN setting to place them into appropriate VLAN.
Wireless configuration: Configuration file from Step 1 is loaded into the Aruba IAP Cluster Master WAP to finish the site install. All switches and WAPs register themselves (as part of their generated configuration file) to Aruba Airwave server. WAPs are renamed based on their location.
Documentation: Ansible scans the site for all switches, collects neighbor details via CDP/LLDP and updates switch port names with connected switch or WAP name and peer port.
I’ve seen many successful network automation implementations, but the scope of this one literally blew my mind.
Interested in building something similarly awesome? Join the Building Network Automation Solutions online course.
3 comments: