Blog Posts in May 2024
Worth Reading: ChatGPT Does Not Summarize
I mostly gave up on LLMs being any help (apart from generating copious amounts of bullshit), but I still thought that generating summaries might be an interesting use case. I was wrong.
As Gerben Wierda explains in his recent “When ChatGPT summarises, it actually does nothing of the kind” blog post, you have to understand a text if you want to generate a useful summary, and that’s not what LLMs do. They can generate a shorter version of the text, which might not focus on the significant bits.
BGP Labs: Graceful Shutdown
Using the typical default router configurations, it can take minutes between a failure of an inter-AS link and the convergence of BGP routes. You can fine-tune that behavior with BGP timers and BFD (and still get pwned by Graceful Restart). While you can’t influence link failures, you could drain the traffic from a link before starting maintenance operations on it, and it would be a shame not to do that considering there’s a standard way to do that – the GRACEFUL_SHUTDOWN BGP community defined in RFC 8326. That’s what you’ll practice in the next BGP lab exercise.
Must Read: Make Two Trips
Tom Limoncelli wrote another must-read masterpiece: sometimes you’ll save time if you make two trips instead of one.
The same lesson applies to network design: cramming too many features into a single device will inevitably result in complex, hard-to-understand configurations and weird bugs. Sometimes, it’s cheaper to split the required functionality across multiple devices.
BGP Route Reflectors Considered Harmful
The recent IBGP Full Mesh Between EVPN Leaf Switches blog post generated an interesting discussion on LinkedIn focused on whether we need route reflectors (in small fabrics) and whether they do more harm than good. Here are some of the highlights of that discussion, together with a running commentary.
Testing Device Configuration Templates
Many network automation solutions generate device configurations from a data model and deploy those configurations. Last week, we focused on “how do we know the device data model is correct?” This time, we’ll take a step further and ask ourselves, “how do we know the device configurations work as expected?”
There are four (increasingly complex) questions our tests should answer:
Worth Reading: Using AWS Services via IPv6
AWS started charging for public IPv4 addresses a few months ago, supposedly to encourage users to move to IPv6. As it turns out, you need public IPv4 addresses (or a private link) to access many AWS services, clearly demonstrating that it’s just another way of fleecing the sheep Hotel California tax. I’m so glad I moved my videos to Cloudflare ;)
For more details, read AWS: Egress Traffic and Using AWS Services via IPv6 (rendered in beautiful, easy-to-read teletype font).
EVPN Designs: IBGP Full Mesh Between Leaf Switches
In the previous blog post in the EVPN Designs series, we explored the simplest possible VXLAN-based fabric design: static ingress replication without any L2VPN control plane. This time, we’ll add the simplest possible EVPN control plane: a full mesh of IBGP sessions between the leaf switches.
EVPN Rerouting After LAG Member Failures
In the previous two blog posts (Dealing with LAG Member Failures, LAG Member Failures in VXLAN Fabrics) we discovered that it’s almost trivial to deal with a LAG member failure in an MLAG cluster if we have a peer link between MLAG members. What about the holy grail of EVPN pundits: ESI-based MLAG with no peer link between MLAG members?
BGP Labs: Load Balancing across EBGP Paths
Let’s open another juicy can of BGP worms: load balancing. In the first lab exercise, you’ll configure equal-cost load balancing across EBGP paths and tweak the “What is equal cost?” algorithm to consider just the AS path length, not the contents of the AS path.
Testing Network Automation Data Transformation
Every complex enough network automation solution has to introduce a high-level (user-manageable) data model that is eventually transformed into a low-level (device) data model.
The transformation code (business logic) is one of the most complex pieces of a network automation solution, and there’s only one way to ensure it works properly: you test the heck out of it ;) Let me show you how we solved that challenge in netlab.
Must Read: OSPF Protocol Analysis (RFC 1245)
Daniel Dib found the ancient OSPF Protocol Analysis (RFC 1245) that includes the Router CPU section. Please keep in mind the RFC was published in 1991 (35 years ago):
Steve Deering presented results for the Dijkstra calculation in the “MOSPF meeting report” in [3]. Steve’s calculation was done on a DEC 5000 (10 mips processor), using the Stanford internet as a model. His graphs are based on numbers of networks, not number of routers. However, if we extrapolate that the ratio of routers to networks remains the same, the time to run Dijkstra for 200 routers in Steve’s implementation was around 15 milliseconds.
MLAG Deep Dive: LAG Member Failures in VXLAN Fabrics
In the Dealing with LAG Member Failures blog post, we figured out how easy it is to deal with a LAG member failure in a traditional MLAG cluster. The failover could happen in hardware, and even if it’s software-driven, it does not depend on the control plane.
Let’s add a bit of complexity and replace a traditional layer-2 fabric with a VXLAN fabric. The MLAG cluster members still use an MLAG peer link and an anycast VTEP IP address (more details).
netlab 1.8.2: Bug Fixes, Usability Improvements
netlab release 1.8.2 contains dozens of bug fixes and minor tweaks to device configuration templates. We also added a few safeguards including:
- Check for Vagrant boxes or Docker containers before starting the lab and display pointers to build recipes.
- Check installed Ansible collections before trying to configure the lab devices.
- Display a warning if the lab topology was modified after the lab was created
netlab: VRF Instantiation on Lab Devices
In the previous blog post on this topic, I described how node and global VRFs work in netlab.
TL&DR: If you use the same VRF on multiple devices, it’s better to define it globally.
However, you might not need every VRF on every lab device in a more complex lab topology. Considering that, netlab tries to minimize the number of VRFs configured on lab devices using a simple rule: a VRF is configured on a lab device only if the device has at least one interface in that VRF.
BGP Labs: Reduce FIB Size on Access Routers
Here’s another BGP lab challenge to start your weekend: use RIB-to-FIB filters to reduce the forwarding table size on access routers in a large Service Provider network.
MLAG Deep Dive: Dealing with LAG Member Failures
Craig Weinhold pointed me to a complex topic I managed to ignore in my MLAG Deep Dive series: how does an MLAG cluster reroute around a failure of a LAG member link?
In this blog post, we’ll focus on traditional MLAG cluster implementations using a peer link; another blog post will explore the implications of using VXLAN and EVPN to implement MLAG clusters.
We’ll also ignore the interesting question of “how is the LAG member link failure detected?”1 and focus on “what happens next?” using the sample MLAG topology:
Worth Exploring: LibreQoS
Erik Auerswald pointed me to an interesting open-source project. LibreQoS implements decent QoS using software switching on many-core x86 platforms. It’s implemented as a bump-in-the-wire software solution, so you should be able to plug it into your network just before a major congestion point and let it handle the packet dropping and prioritization.
Obviously, the concept is nothing new. I wrote about a similar problem in xDSL networks in 2009.
Repost: State of Lisp Implementations (2024)
You might remember Béla Várkonyi’s use of LISP to build resilient ground-to-airplane networks from last week’s repost. It seems he’s not exactly happy with the current level of LISP support, at least based on what he wrote as a response to Jeff McLaughlin’s claim that “I can tell you that our support for EVPN does not, in any way, indicate the retirement of LISP for SD-Access.”:
Nice to hear the Cisco intends to support LISP. However, it is removed from IOS XR already. So it is not that clear…
If Cisco will stop supporting LISP, then we will be forced to create our own LISP routers, since we need it for extreme mobility environments.
… updated on Monday, September 9, 2024 16:26 +0200
Famous Last Words: I'm Too Stupid for That
Some networking vendors realized that one way to gain mindshare is to make their network operating systems available as free-to-download containers or virtual machines. That’s the right way to go; I love their efforts and point out who went down that path whenever possible1 (as well as others like Cisco who try to make our lives miserable).
However, those virtual machines better work out of the box, or you’ll get frustrated engineers who will give up and never touch your warez again, or as someone said in a LinkedIn comment to my blog post describing how Junos vPTX consistently rejects its DHCP-assigned IP address: “If I had encountered an issue like this before seeing Ivan’s post, I would have definitely concluded that I am doing it wrong.”2
Worth Reading: Cisco vPC in VXLAN/EVPN Networks
Daniel Dib started writing a series of blog posts describing Cisco vPC in VXLAN/EVPN Networks. The first one covers the anycast VTEP, the second one the vPC configuration.
Let’s hope he will keep them coming and link them together so it will be easy to find the whole series after stumbling on one of the posts ;)
BGP Labs: EBGP Sessions over IPv6 LLA Interfaces
If you insist on building your network with EBGP as a better IGP, make sure your implementation supports running IPv4 and IPv6 address families over EBGP sessions established between IPv6 link-local addresses (the functionality lovingly called unnumbered EBGP sessions).
Want to practice that neat trick? Check out the EBGP Sessions over IPv6 LLA Interfaces lab exercise.