Build the Next-Generation Data Center
6 week online course starting in spring 2017

Network Automation @ Spotify on Software Gone Wild

What can you do if you have a small team of networking engineers responsible for four ever-growing data centers (with several hundred network devices in each of them)? There’s only one answer: you try to survive by automating as much as you can.

In the fourth episode of Software Gone Wild podcast David Barosso from Spotify explains how they use network automation to cope with the ever-growing installed base without increasing the size of the networking team.

During our chat we also touched on other topics you might find interesting:

  • The rate of change is incredible: everything that is six months old is already legacy.
  • Using proper software development tools and processes: the networking team develops a pilot implementation (the hacking phase), and hands it over to software developers that productize it.
  • Don’t overautomate: the monitoring software detects broken tunnels, but the changes are still triggered by an operator;
  • Dealing with multi-vendor environment: vendor-specific device configuration syntax is stored in Ansible templates;
  • Scale-out the WAN connectivity: they need a scale-out farm of VPN termination devices to cope with the inter-DC traffic;
  • Using Ansible for troubleshooting: device-dependent scripts extract information in JSON format, which is then used in Ansible playbooks to perform basic troubleshooting;
  • CLI on top of Ansible: use Ansible to create service deployment scripts that can be invoked with simple CLI commands by the operators;

Of course we had to mention numerous technical details, including:

  • BGP-based SDN: They use BGP communities to take unreliable VPN tunnels out of the forwarding path;
  • Graceful shutdown: the solutions implemented by the vendors are not exactly graceful. Spotify is using an Ansible playbook to implement a proper graceful shutdown – setting the IS-IS overload bit and using BGP AS-path prepending to redirect traffic before shutting down a network device;
  • Broken APIs: most APIs offered by networking vendors are just poor CLI wrappers;
  • Ansible’s language independence: apart from having to write the templates in Jinja2, you can write Ansible modules in any language as long as the language can be made to support JSON encoding;

Enjoy the podcast, explore the other Software Gone Wild episodes, and don’t forget to leave a review on iTunes.

0 comments:

Post a Comment

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.