Blog Posts in May 2022
ipSpace.net subscribers are probably already familiar with the Design Clinic: a monthly Zoom call in which we discuss real-life design- and technology challenges. I started it in September 2021 and it quickly became reasonably successful; we covered almost two dozen topics so far.
- Can we implement Data Center Interconnect (DCI) with VXLAN? (Yes, but…)
- Can we run VXLAN over SD-WAN (and does it make sense)? (Yes/No)
- What happened to traditional MPLS/VPN Enterprise core and can we use VXLAN/EVPN instead? (Still there/Maybe)
- Should we use routers or switches as data center WAN edge devices, and how do we integrate them with VXLAN/EVPN data center fabric? (Yes 😊)
For more details, join us on June 6th. There’s just a minor gotcha: you have to be an active ipSpace.net subscriber to do it.
I had no idea how convoluted VLANs could get until I tried to implement them in netlab.
We’ll start with the simplest option: a single VLAN stretched across two
bridges switches with two Linux hosts connected to it. netlab can configure VLANs on Arista EOS, Cisco IOSv, Cisco Nexus OS, VyOS, Dell OS10, and Nokia SR Linux. We’ll use the quickest (deployment-wise) option: Arista EOS on containerlab.
Using Terraform to deploy networking elements with an SDN controller that cannot replace the current state of a tenant with the desired state specified in a text file (because nobody ever wants to do that, right) sounds like a great idea… until you try to do it at scale.
Noël Boulene hit interesting scalability limits when trying to provision VLANs on Cisco ACI with Terraform. If you’re thinking about doing something similar, you REALLY SHOULD read his article.
Are you afraid the network automation will eat your job? You might have to worry if you’re a VLAN-provisioning CLI jockey, but then you’re not alone. Textile workers faces the same challenges in 19th century and automation report from 1958 the clerical workers were facing the same dilemma when the first computers were introduced.
Guess what: unemployment rate has been going up and down in the meantime (US data), but mostly due to various crisis. Automation had little impact.
Javier Antich concluded the AI/ML in Networking webinar with the ugly challenges of using AI/ML in networking. I won’t spoil the fun, you REALLY SHOULD watch the video (keeping in mind he was trying to stay polite and diplomatic).
Every network engineer should be familiar with the DNS basics – after all, all network failures are caused by DNS… unless it’s BGP.
The May 2022 ISP Column by Geoff Huston is an excellent place to brush up on your DNS basics and learn about new ideas, including a clever one to push DNS entries that will be needed in the future to a web client through a DNS-over-HTTPS session.
I migrated my blog to Hugo two years ago, and never regretted the decision. At the same time I implemented version control with Git, and started using GitHub (and GitLab for a convoluted set of reasons) to host the blog repository.
After hesitating for way too long, I decided to go one step further and made the blog repository public. The next time a blatant error of mine annoys you fork it, fix my blunder(s), and submit a pull request (or write a comment and I’ll fix stuff like I did in the past).
I’m usually telling networking engineers seriously considering whether to automate their networks to cleanup their design and simplify the network services first.
The only reasonable way forward is to simplify your processes – get rid of all corner cases, all special deals that are probably costing you more than you earned on them, all one-off kludges to support badly-designed applications – and once you get that done, you might realize you don’t need a magic platform anymore, because you can run your simpler network using traditional tools.
While seasoned automation practitioners agree with me, a lot of enterprise engineers face a different reality. Straight from a source that wished to remain anonymous…
I stumbled upon a blog post by Diptanshu Singh discussing whether IS-IS flooding in highly meshed fabric is as much of a problem as some people would like to make it. I won’t spoil the fun, read his blog post ;)
The really interesting part (for me) was the topology he built with netlab and containerlab: seven leaf-and-spine fabrics connected with WAN links and superspines for a total of 68 instances of Arista cEOS. I hope he automated building the topology file (I’m a bit sorry we haven’t implemented composite topologies yet); after that all he had to do was to execute netlab up to get a fully-configured lab running IS-IS.
Stuart Charlton did his best to explain the concept of pods in the Kubernetes Networking Deep Dive webinar, but we were still a bit confused. Next step: let’s talk about typical inter-pod traffic scenario.
Continuing the how real is the decade-old SDN hype thread, let’s try to figure out if anyone still uses OpenFlow. OpenFlow was declared dead by the troubadour of the SDN movement in 2016, so it looks like the question is moot. However, nothing ever dies in networking (including hop-by-hop IPv6 extension headers), so here we go.
Why Would One Use OpenFlow?
Ignoring for the moment the embarrassing we solved the global load balancing with per-flow forwarding academic blunders1, OpenFlow wasn’t the worst tool for programming forwarding exceptions (ACL/PBR) into TCAM.
Even though Gartner declared SDN obsolete before plateau in their 2021 Networking Hype Cycle, most vendor marketers never got the memo. Anything that interacts with network devices in any way1 is called an SDN controller. Let’s try to throw some minimal amount of taxonomy into that mess based on how these controllers interact with network elements (physical or virtual).
- Network standards and platforms
- Data plane encryption
- Control plane security
- Key- and system management
- Relevant approvals
- Vendors and products, including detailed feature support matrices.
I started preparing the materials for the SDN – 10 years later webinar, and plan to publish a series of blog posts documenting what I found on various aspects of what could be considered SDN1. I’m pretty sure I missed quite a few things; your comments are most welcome.
Let’s start with an easy one: software/hardware disaggregation in network devices.
Open-Source Network Operating Systems
I found several widely-used open-source2 network operating systems:
Another interesting column by Geoff Huston: performance of TCP congestion control protocols when using Low-Earth Orbit or Geosynchronous Orbit satellites for Internet access.
Here’s another “do these things ever disappear?” question from Enrique Vallejo:
Regarding storage, is Fibre Channel still a thing in 2022, or most people employ SATA over Ethernet and NVMe over fabrics?
TL&DR: Yes. So is COBOL.
To understand why some people still use Fibre Channel, we have to start with an observation made by Howard Marks: “Storage is different.” It’s OK to drop a packet in transit. It’s NOT OK to lose data at rest.
Here’s a short list of major goodies included in netsim-tools release 1.2.2:
- Access VLANs, VLAN trunks and native VLANs implemented on Cisco IOS, Arista EOS, VyOS, and Dell OS10 (VyOS and OS10 support contributed by Stefano Sasso)
- Hardware labs implemented with external topology provider (contributed by Stefano Sasso)
- VRF loopback interfaces (contributed by Stefano Sasso)
More details in the release notes.
Recent news from the Department of Unintended Consequences: RFC 6724 changed the IPv4/IPv6 source/destination address selection rules a decade ago, and it seems that the common interpretation of those rules makes IPv6 Unique Local Addresses (ULA) less preferred than the IPv4 addresses, at least according to the recent Unintended Operational Issues With ULA draft by Nick Buraglio, Chris Cummings and Russ White.
End result: If you use only ULA addresses in your dual-stack network1, IPv6 won’t be used at all. Even worse, if you use ULA addresses together with global IPv6 addresses (GUA) as a fallback mechanism, there might be hidden gotchas that you won’t discover until you turn off IPv4. Looks like someone did a Truly Great Job, and ULA stands for Useless Local Addresses.
A friend of mine working for a mid-sized networking vendor sent me an intriguing question:
We have a product using an old ASIC that has 12K forwarding entries, and would like to extend its lifetime. I know you were mentioning some useful tricks, would you happen to remember what they were?
This challenge has no perfect solution, but there are at least three tricks I’ve encountered so far (as always, comments are most welcome):
Hint: if you have no idea what Bufferbloat or fq_codel are, you REALLY SHOULD explore Dave’s web site.
Most large content providers use some sort of egress traffic engineering on edge web proxy/caching servers to optimize the end-user experience (avoid congested transit autonomous systems) and link utilization on egress links.
I was planning to write a blog post about the tricks they use for ages, and never found time to do it… but if you don’t mind watching a video, the Source Routing on the Edge presentation Oliver Herms had at iNOG::14v does a pretty good job explaining the concepts and a particular implementation.
Christopher Werny has tons of hands-on experience with IPv6 security (or lack thereof), and described some of his findings in the Practical Aspects of IPv6 Security part of IPv6 security webinar, including:
- Impact of dual-stack networks
- Security implications of IPv6 address planning
- Isolation on routing layer and strict filtering
- IPv6-related requirements for Internet- or MPLS uplinks
netlab started as a simple tool to create virtual lab topologies (I hated creating Vagrantfiles describing complex topologies), but when it morphed into an ever-growing “configure all the boring stuff in your lab from a high-level description” thingie, it gave creative networking engineers an interesting idea: could we use this tool to do all the stuff we always hated doing in our physical labs?
My answer was always “of course, please feel free to submit a PR”, and Stefano Sasso did just that: he implemented external orchestration provider that allows you to use netlab to configure IPv4, IPv6, VLANs, VRFs, VXLAN, LLDP, BFD, OSPFv2, OSPFv3, EIGRP, IS-IS, BGP, MPLS, BGP-LU, L3VPN (VPNv4 + VPNv6), EVPN, SR-MPLS, or SRv6 on supported hardware devices.
Hope you’ll enjoy the presentation as much as I did… and make sure you understand potential circular dependencies you might be introducing when running a route reflector as a virtual machine.
Are FabricPath, TRILL or SPB still alive, or has everyone moved to VXLAN? Are they worth studying?
TL&DR: Barely. Yes. No.
Layer-2 Fabric craziness exploded in 2010 with vendors playing the usual misinformation games that eventually resulted in totally fragmented market full of partial- or proprietary solutions. At one point in time, some HP data center switches supported only TRILL, and other data center switches from the same company supported only SPB.
Now for individual technologies: