Network validation is becoming another overhyped buzzword with many opinionated pundits talking about it and few environments using it in practice (why am I not surprised?)
As always, there are exceptions. They don’t have to be members of the FAANG club, and some of them get the job done with open-source tools regardless of what vendor marketers would like you to believe. For example, Donatas Abraitis described how the Hostinger networking team gradually implemented network validation using Cumulus VX, Vagrant, SuzieQ, PyTest and Test Kitchen. Enjoy!
In May 2021, Javier Antich ran a great webinar explaining the principles of Artificial Intelligence and Machine learning and how they apply (or not) to networking.
He started with a brief overview of AI/ML hype that should help you understand why there’s a bit of a difference between self-driving cars (not that we got there) and self-driving networks.
A while ago my friend Nicola Modena sent me another intriguing curveball:
Imagine a CTO who has invested millions in a super-secure data center and wants to consolidate all compute workloads. If you were asked to run a BGP Route Reflector as a VM in that environment, and would like to bring OSPF or ISIS to that box to enable BGP ORR, would you use a GRE tunnel to avoid a dedicated VLAN or boring other hosts with routing protocol hello messages?
While there might be good reasons for doing that, my first knee-jerk reaction was:
I was happily munching popcorn while watching the latest season of Lack of DHCPv6 on Android soap opera on v6ops mailing list when one of the lead actors trying to justify the current state of affairs with a technical argument quoted an RFC to prove his rightful indignation with DHCPv6 and the decision not to implement it in Android:
[…not having multiple IPv6 addresses per interface…] is also harmful for a variety of reasons, and for general purpose devices, it’s not recommended by the IETF. That’s exactly what RFC 7934 is about - explaining why it’s harmful.
The whole High Availability Switching series started with a question along the lines of “does it make sense to run BFD together with Graceful Restart”. After Non-Stop Forwarding 101, Graceful Restart 101, and Graceful Restart and Convergence Speed we finally have enough information to answer that question.
TL&DR: Most probably not.
A more nuanced answer depends (as always) on a gazillion implementation details.
In mid-October I finally found time to add the icing to the netsim-tools cake: netlab up command takes a lab topology and does everything needed to have a running virtual lab:
- Create Vagrantfile or containerlab topology file
- Create Ansible inventory
- Start the lab with vagrant up or containerlab deploy
- Deploy device configurations, from LLDP and interface addressing to routing protocols and Segment Routing
Every other blue moon someone writes (yet another) article along the lines of professional liability would solve so many broken things in the IT industry. This time it’s Poul-Henning Kamp of the FreeBSD and Varnish fame with The Software Industry IS STILL the Problem. Unfortunately it’s just another stab at the windmills considering how much money that industry pours into lobbying.
Decades ago there was a trick question on the CCIE exam exploring the intricate relationships between MAC and ARP table. I always understood the explanation for about 10 minutes and then I was back to I knew why that’s true, but now I lost it.
Fast forward 20 years, and we’re still seeing the same challenges, this time in EVPN networks using in-subnet proxy ARP. For more details, read the excellent ARP problems in EVPN article by Dmytro Shypovalov (I understood the problem after reading the article, and now it’s all a blur 🤷♂️).
You wouldn’t believe the intricate network designs I created decades ago until I learned that having an uninterrupted sleep is worth more than proving I can get the impossible to work (see also: using EBGP instead of IGP in a 4-node data center fabric).
Once I started valuing my free time, I tried to design things to be as simple as possible. However, as my friend Nicola Modena once said, “Consultants must propose new technologies because they must be seen as bringing innovation,” and we all know complexity sells. Go figure.
Almost a decade ago I described a scenario in which a perfectly valid IBGP topology could result in a permanent routing loop. While one wouldn’t expect to see such a scenario in a well designed network, it’s been known for ages1 that using BGP route reflectors could result in suboptimal forwarding.
Here’s a simple description of how that could happen:
In case you missed it, there’s a new season of Lack of DHCPv6 on Android soap opera on v6ops mailing list. Before going into the juicy details, I wanted to look at the big picture: why would anyone care about lack of DHCPv6 on Android?
The requirements for DHCPv6-based address allocation come primarily from enterprise environments facing legal/compliance/other layer 8-10 reasons to implement policy (are you allowed to use the network), control (we want to decide who uses the network) and attribution (if something bad happens, we want to know who did it).
I’m always amazed when I encounter networking engineers who want to have a fast-converging network using Non-Stop Forwarding (which implies Graceful Restart). It’s even worse than asking for smooth-running heptagonal wheels.
As we discussed in the Fast Failover series, any decent router uses a variety of mechanisms to detect adjacent device failure:
- Physical link failure;
- Routing protocol timeouts;
- Next-hop liveliness checks (BFD, CFM…)
Last week’s update session of the AWS Networking webinar covered two hours worth of new (or not-yet-covered) features, including:
- Transit Gateway Connect functionality (GRE tunnel+BGP between Transit Gateway and in-cloud SD-WAN appliances)
- AWS Private Link
- Intra-VPC static routes that you can use to send inter-subnet traffic to a BYOD security appliance
- IGMPv2 support
- Custom global accelerators
- Assigning whole IP prefixes to VM interfaces
The recordings have already been published, either as independent videos or integrated with the existing materials. Enjoy ;)
I totally understand that entities relying on sponsors have to become creative while promoting whatever theirs sponsors want to sell, but in my opinion this is a bridge too far:
[…] explore how Gluware aims to democratize automation; that is, get you quick wins around common tasks such as configuration changes and OS updates.
Democratizing automation? Because it’s authoritarian now? By providing the abilities like configuration changes and OS updates that have been available in network management tools like CiscoWorks or SolarWinds for ages?
You know what’s really hard when automating existing networks? Figuring out how to simplify them to the point where it makes sense to automate them. Will any shrink-wrapped GUI product solve that? Of course not.
We all know that you have to use an AS number between 64512 and 65535 for private BGP autonomous systems, right? Well, we’re all wrong – the high end of the range is 65534, and Chris Parker wrote a nice blog post explaining the reasons behind that change.