Webinars in January 2020
January 2020 was one of the busiest months we ever had:
- Elisa Jasinska and Nick Buraglio started the year with the first part of Surviving in Internet Default-Free Zone webinar.
- Krzysztof Szarkowicz completed the EVPN-with-MPLS topic describing the intricacies of layer-2 and layer-3 integration, and extending the EVPN webinar to almost 11 hours (with three hours of it covering EVPN in SP/MPLS world).
- Matthias Luft continued the Cloud Security saga, covering logging, monitoring, testing and auditing.
- Preparing for our public cloud course, I described how you can automate AWS infrastructure deployments.
- Christopher Werny wrapped up the month with the second part of IPv6 Enterprise Security talk, this time focusing on layer-2 security.
You can get immediate access to all these webinars with Standard or Expert ipSpace.net subscription.
Be Careful When Using New Features
During a recent workshop I made a comment along the lines “be careful with feature X from vendor Y because it took vendor Z two years to fix all the bugs in a very similar feature”, and someone immediately asked “are you saying it doesn’t work?”
My answer: “I never said that, I just drew inferences from other people’s struggles.”
A Step Back
Networking operating systems are probably some of the most complex pieces of software out there. Distributed systems are hard. Real-time distributed systems are even harder. Real-time distributed systems running on top of eventually-consistent distributed databases are extra fun.
Video: The Network Is Not Reliable
After introducing the fallacies of distributed computing in the How Networks Really Work webinar, I focused on the first one: the network is (not) reliable.
While that might be understood by most networking professionals (and ignored by many developers), here’s an interesting shocker: even TCP is not always reliable (see also: Joel Spolsky’s take on Leaky Abstractions).
How to Start Your Network Automation Journey
A journey of a thousand miles begins with one step they say… but what should that first step be if you want to start a network automation journey (and have no idea how to do it)?
Anne Baretta sent me a detailed description of his journey, which (as is often the case) started with the standardized configuration templates.
The EVPN/EBGP Saga Continues
Aldrin wrote a well-thought-out comment to my EVPN Dilemma blog post explaining why he thinks it makes sense to use Juniper’s IBGP (EVPN) over EBGP (underlay) design. The only problem I have is that I forcefully disagree with many of his assumptions.
He started with an in-depth explanation of why EBGP over directly-connected interfaces makes little sense:
Upcoming Events and Webinars (February 2020)
If you’re an ipSpace.net subscriber, you might have noticed how busy the last month has been (more about that later). February won’t be much better:
- Later today we’ll have David Barroso talk about safely managing network automation secrets.
- On February 6th I’ll describe the tools you can use to automate Azure deployments, including simple CLI scripts, Ansible, Terraform, and Azure Resource Manager templates.
- We’re starting the Networking in Public Cloud Deployments online course on February 11th.
- David Peñaloza Seijas will talk about Cisco (Viptela) SD-WAN on February 13th;
Finally, I’ll run a day-long workshop in Zurich on March 10th describing containers and Docker.
Connecting Your Legacy WAN to Cloud is Harder than You Think
Unless you’re working for a cloud-only startup, you’ll always have to connect applications running in a public cloud with existing systems or databases running in a more traditional environment, or connect your users to public cloud workloads.
Public cloud providers love stable and robust solutions, and they took the same approach when implementing their legacy connectivity solutions: you could use routed Ethernet connections or IPsec VPN, and run BGP across them, turning the problem into a well-understood routing problem.
Worth Reading: SD-WAN Scalability Challenges
In January 2020 Doug Heckaman documented his experience with VeloCloud SD-WAN. He tried to be positive, but for whatever reason this particular bit caught my interest:
Edge Gateways have a limited number of tunnels they can support […]
WTF? Wasn’t x86-based software packet forwarding supposed to bring infinite resources and nirvana? How badly written must your solution be to have a limited number of IPsec tunnels on a decent x86 CPU?
You're Responsible for Resiliency of Your Public Cloud Deployment
Enterprise environments usually implement “mission-critical” applications by pushing high-availability requirements down the stack until they hit networking… and then blame the networking team when the whole house of cards collapses.
Most public cloud providers are not willing to play the same stupid blame-shifting game - they live or die by their reputation, and maintaining a stable service is their highest priority. They will do their best to implement a robust and resilient infrastructure, but will not do anything that could impact its stability or scalability… including the snake oil the virtualization and networking vendors love to sell to their gullible customers. When you deploy your application workloads into a public cloud, you become responsible for the resiliency of your own application, and there’s no magic button that could allow you to push the problems down the stack.
Transforming XML Data With Ansible
Some network devices return structured data in either text- or XML format (but cannot spell JSON). Ansible prefers getting JSON-formatted data, and has a number of filters to process text printouts… but what could you do if you want to work with XML documents within Ansible? I described a few solutions in Transforming XML Data in Ansible.
Master Infrastructure-as-Code and Immutable Infrastructure Principles
Doing the same thing and hoping for a different result is supposedly a definition of insanity… and managing public cloud deployments with an unrepeatable sequence of GUI clicks comes pretty close to it.
Engineers who mastered the art of public cloud deployments realized decades ago that the only way forward is to treat infrastructure in the same way as any other source code:
Fast Failover in SD-WAN Networks
It’s amazing how quickly you get “must have feature Y or it should not be called X” comments coming from vendor engineers the moment you mention something vaguely-defined like SD-WAN.
Here are just two of the claims I got as a response to “BGP with IP-SLA is SD-WAN” trolling I started on LinkedIn based on this blog post:
Key missing features [of your solution]:
- real time circuit failover (100ms is not real-time)
- traffic steering (again, 100ms is not real-time)
Let’s get the facts straight: it seems Cisco IOS evaluates route-map statements using track objects in periodic BGP table scan process, so the failover time is on order of 30 seconds plus however long it takes IP SLA to detect the decreased link quality.
Worth Reading: Machine Learning Explained
I hope you're familiar with Clarke's third law (and leave it to your imagination to explain how it relates to SDN ;). In case you want to look beyond the Machine Learning curtain, you might find the Machine Learning Explained article highly interesting. Spoiler: it all started in 1960s with over 2000 matchboxes.
Video: FRRouting Architecture
After a brief overview of FRRouting suite Donald Sharp continued with a deep dive into FRR architecture, including the various routing daemons, role of Zebra and ZAPI, interface between RIB (Zebra) and FIB (Linux Kernel), sample data flow for route installation, and multi-threading in Zebra and BGP daemons.
Automation Solution: Testing Data Models
If your automation solution relies on a back-end database with strict database schema you can stop reading… but if you (like most others) still live in the land of text files encoded in your favorite presentation format (because it’s hip to hate YAML), you might appreciate the solution Donald Johnson uses to check his data models before committing them into Git repository.