Blog Posts in March 2023
It’s time for another Kubernetes video. After Stuart Charlton explained the Kubernetes SDN architecture, he described architectural approaches of Kubernetes SDN implementations, using Flannel as a sample implementation.
I wanted to include a few examples of BGP bugs causing widespread disruption in the Network Security Fallacies presentation. I tried to find what happened when someone announced beacon prefixes with unknown optional transitive attributes (which should have been passed without complaints but weren’t) without knowing when it happened or who did it.
Trying to find the answer on Google proved to be a Mission Impossible – regardless of how I structured my query, I got tons of results that seemed relevant to a subset of the search words but nowhere near what I was looking for. Maybe I would get luckier with a tool that’s supposed to have ingested all the world’s knowledge and seems to (according to overexcited claims) understand what it’s talking about.
An ipSpace.net subscriber sent me this question:
I am on job hunting. I have secured an interview and they will probably ask me about VxLAN BGP EVPN fabrics. If you have some time, it would be a great help for me if you could tell me 1 or 2 questions that you would ask in such interviews.
TL&DR: He got the job. Congratulations!
The last time I spent days poring over vendor datasheets collecting information for the overview part of Data Center Fabrics webinar a lot of 1RU data center leaf switches came in two form factors:
- 48 low-speed server-facing ports and 4 high-speed uplinks
- 32 high-speed ports that you could break out into four times as many low-speed ports (but not all of them)
I expected the ratios to stay the same when the industry moved from 10/40 GE to 25/100 GE switches. I was wrong – most 1RU leaf data center switches based on recent Broadcom silicon (Trident-3 or Trident-4) have between eight and twelve uplinks.
TL&DR: It works exactly as expected. Even though I had anycast gateway configured on the VLAN, the Arista vEOS switches used their unicast IP addresses in the DHCP relaying process. The DHCP server had absolutely no problem dealing with multiple copies of the same DHCP broadcast relayed by different switches attached to the same VLAN. One could only wish things were always as easy in the networking land.
I have blog post ideas sitting in my to-write queue for over a decade. One of them is why would you need a VRF (and associated router) between virtual servers and a firewall?
Andrea Dainese answered at least part of that question in his Off-Path firewall with Traffic Engineering blog post. Enjoy!
Another interesting take on ChatGPT in networking, this time by Tom Hollingsworth in The Dangers of Knowing Everything:
In a way, ChatGPT is like a salesperson. No matter what you ask it the answer is always yes, even if it has to make something up to answer the question.
To paraphrase an old joke: It’s not that ChatGPT is lying. It’s just that what it knows isn’t necessarily true. See also: the difference between bullshit and lies.
Did you know most chassis switches look like leaf-and-spine fabrics1 from the inside? If you didn’t, you might want to watch the short Chassis Architectures video by Pete Lumbis (author of ASICs for Networking Engineers part of the Data Center Fabric Architectures webinar).
TL&DR: No. You can move on.
Like most using ChatGPT for something articles we’re seeing these days, the presentation is a bit too positive for my taste. After all, it’s all fine and dandy to claim ChatGPT generates working router configurations and related Jinja2 templates if you know what the correct configurations should look like and can confidently say “and this is where it made a mistake” afterwards.
Over the years I wrote a dozen blog posts describing various aspects of using CI/CD in network automation. These blog posts are now collected in the new CI/CD in Networking page that also includes links to related podcasts, webinars, and sample network automation solutions.
A networking engineer attending the Building Next-Generation Data Center online course asked this question:
What is the best practice to connect DC fabric to outside world assuming there are 2 spine switches in the fabric and EVPN VXLAN is used as overlay? Is it a good idea to introduce edge (border) switches, or it is better to connect outside world directly to the spine?
As always, the answer is “it depends,” this time based on:
I had to make just a few changes to the DHCP relaying lab topology:
- DHCP server is running on CSR 1000v. IOSv DHCP server does not support subnet selection DHCP option and thus doesn’t work with relays that do inter-VRF DHCP relaying.
- I put the link between the DHCP client and DHCP relay into a VRF.
Just in case you wondered why we have eight bits per byte: after Julia Evans investigated this mystery, Steven Bellovin published an excellent overview of the early years of bytes and words.
Vadim Semenov created an interesting solution out of open-source tools (and some glue): a system that tracks, logs, and displays OSPF changes in your network.
It might not be exactly what you’re looking for (and purists would argue it should use BGP-LS), but that’s the beauty of open-source solutions: go and adapt it to your needs, generalizes your fixes, and submit a pull request.
Not surprisingly, most implementations of virtual MLAG peer link remain proprietary. Lukas Krattiger described the details of Cisco’s vPC Fabric Peering implementation in the EVPN Deep Dive webinar.
A few weeks ago I described why EBGP TCP packets have TTL set to one (unless you configured EBGP multihop). Although some people claim that (like NAT) it could be a security feature, it’s not a good one. Generalized TTL Security Mechanism (GTSM, described in RFC 5082) is much better.
Most BGP implementations set TTL field in outgoing EBGP packets to one. That prevents a remote intruder that manages to hijack a host route to an adjacent EBGP peer from forming a BGP session as the TCP replies get lost the moment they hit the first router in the path.
Even though IPv6 could buy its own beer (in US, let alone rest of the world), networking engineers still struggle with its deployment – one of the first questions I got in the ipSpace.net Design Clinic was:
We have been tasked to start IPv6 planning. Can we discuss (for enterprises like us who all of the sudden want IPv6) which design paths to take?
I did my best to answer this question and describe the basics of creating an IPv6 addressing plan. For even more details, watch the IPv6 webinars (most of them at least a few years old, but nothing changed in the IPv6 world in the meantime apart from the SRv6 madness).
I’m always envious of how easy networking challenges seem when you’re solving them in PowerPoint, for example, when an innovation specialist explains how scalability works in leaf-and-spine fabrics in a LinkedIn comment:
One of the main benefits of a CLOS folded spine topology is the scale out spine where you can scale out the number of spine nodes increasing your leaf-spine n-way ECMP as well as minimizing the blast radius with the more spine nodes the more redundancy and resiliency.
Isn’t that wonderful? If you need more bandwidth, sprinkle the magic spine powder on your fabric, add water, and voila! Problem solved. Also, it looks like adding spine switches reduces the blast radius. Who would have known?
After figuring out how DHCP relaying works, I decided to test it out in a lab. netlab has no DHCP configuration module (at the moment); the easiest way forward seemed to be custom configuration templates combined with a few extra attributes.
This is how I set up the lab:
Another take on “what are large language models and what can we expect from them,” this time by Bruce Davie: Putting Large Language Models in Context:
My approach, at least for now, is to treat these LLM-based systems as very large, efficient collections of matchboxes–and keep working in my chosen field of networking.
Jeff McLaughlin published an excellent blog post perfectly describing what we’ve been experiencing for decades: the war on expertise.
On one hand, the “business owners” force us to build complex stuff because they think they know better, on the other they blame people who know how to do it for the complex stuff that happens as the result of their requirements:
I am saying that we need to stop blaming complexity on those who manage to understand it.
Chinar Trivedi asked an interesting question about DHCP relaying in VXLAN/EVPN world on Twitter and my first thought was “that shouldn’t be hard” but when I read the first answer that turned into “wait a minute, how exactly does DHCP relaying works?”
I’m positive there’s a tutorial out there somewhere, but I decided to go back to the sources of wisdom: the RFCs. It turned out to be a long walk down the IETF history lane.
I wrote two dozen blog posts describing IP anycast concepts, from first-hop anycast gateways to anycast between DNS servers and global anycast (as used by large web properties), but never organized them in any usable form.
That’s fixed: everything I ever wrote about anycast is nicely structured on the new Anycast Resources page.
An ipSpace.net subscriber sent me a question along the lines of “does it matter that EVPN uses BGP to implement dynamic MAC learning whereas in traditional switching that’s done in hardware?” Before going into those details, I wanted to establish the baseline: is dynamic MAC learning really implemented in hardware?
Hardware-based switching solutions usually use a hash table to implement MAC address lookups. The above question should thus be rephrased as is it possible to update the MAC hash table in hardware without punting the packet to the CPU? One would expect high-end (expensive) hardware to be able do it, while low-cost hardware would depend on the CPU. It turns out the reality is way more complex than that.
One of the least-documented limitations of virtual networking labs is the number of network interfaces a virtual machine could have. vSphere supports up to 10 interfaces per VM, the default setting for vagrant-libvirt is eight, and I couldn’t find the exact numbers for KVM. Many vendors claim their KVM limit is around 25; I was able to bring up a Nexus 9300v device with 40 adapters.
Anyway, a dozen interfaces should be good enough if you’re building a proof-of-concept fabric, but it might get a bit tight if you want to emulate plenty of edge subnets.
After explaining how netlab fits into the virtual lab orchestration picture and what exactly it can do, let’s focus on what’s the easiest way to get started.
Chris Parker wrote a wonderful blog post going deep into the weeds on how EBGP sessions use IP TTL and why we need multihop EBGP sessions between adjacent devices. However, he couldn’t find a source explaining why early BGP implementations decided to use IP TTL set to one on EBGP sessions:
If there’s a source on the internet that explains when it was decided that EBGP should use a TTL of 1, I can’t find it. I can’t even find it in any RFC. I looked in the RFC for BGP v4, and went all the way back to BGP v1. None of these documents contain the text “TTL or “time to live” or “time-to-live.” It’s not even in the RFC for EGP, back in 1984.
We are beginning to migrate some of our offerings to Microsoft Azure and I need to get up to speed with Azure products. I found this webinar very informative, and Ivan explained the concepts in a clear manner and easy to follow along. I would recommend watching these webinars and then read Microsoft documentation to get a thorough understanding.
Want to have some hands-on work sprinkled on top of that? You’ll find deployment examples in the Networking in Public Clouds GitHub repository.