Blog Posts in May 2023
NTP in a Nutshell
Years ago I’ve been involved in an interesting discussion focusing on NTP authentication and whether you can actually implement it reliably on Cisco IOS. What I got out of it (apart from a working example) was the feeling that NTP and it’s implementation in Cisco IOS was under-understood and under-documented, so I wrote an article about it. Of course the web version got lost in the mists of time but I keep my archives handy.
Last weekend I migrated that article to blog.ipSpace.net. I hope you’ll still find it useful; while it’s pretty old, the fundamentals haven’t changed in the meantime.
Path Failure Detection on Multi-Homed Servers
TL&DR: Installing an Ethernet NIC with two uplinks in a server is easy1. Connecting those uplinks to two edge switches is common sense2. Detecting physical link failure is trivial in Gigabit Ethernet world. Deciding between two independent uplinks or a link aggregation group is interesting. Detecting path failure and disabling the useless uplink that causes traffic blackholing is a living hell (more details in this Design Clinic question).
Want to know more? Let’s dive into the gory details.
Goodbye Twitter. It Was Fun While It Lasted
I joined Twitter in October 2008 (after noticing everyone else was using it during a Networking Field Day event), and eventually figured out how to automate posting the links to my blog posts in case someone uses Twitter as their primary source of news – an IFTTT applet that read my RSS feed and posted links to new entries to Twitter.
This week, I got a nice email from IFTTT telling me they had to disable the post-to-Twitter applet. Twitter started charging for the API, and I was using their free service – obviously the math didn’t work out.
That left me with three options:
Worth Reading: Cargo Cult AI
Before we managed to recover from the automation cargo cults, a tsunami wave of cargo cult AI washed over us as Edlyn V. Levine explained in an ACM Queue article. Enjoy ;)
Also, a bit of a historical perspective is never a bad thing:
Impressive progress in AI, including the recent sensation of ChatGPT, has been dominated by the success of a single, decades-old machine-learning approach called a multilayer (or deep) neural network. This approach was invented in the 1940s, and essentially all of the foundational concepts of neural networks and associated methods—including convolutional neural networks and backpropagation—were in place by the 1980s.
Worth Reading: Building Trustworthy AI
Bruce Schneier wrote an excellent essay explaining why we need trustworthy AI and why we won’t get it as long the AI solutions are created by large tech companies with you are a product business model.
Network Security Vulnerabilities: the Root Causes
Sometime last autumn, I was asked to create a short “network security challenges” presentation. Eventually, I turned it into a webinar, resulting in almost four hours of content describing the interesting gotchas I encountered in the past (plus a few recent vulnerabilities like turning WiFi into a thick yellow cable).
Each webinar section started with a short “This is why we have to deal with these stupidities” introduction. You’ll find all of them collected in the Root Causes video starting the Network Security Fallacies part of the How Networks Really Work webinar.
Inter-VRF DHCP Relaying with Redundant DHCP Servers
Previous posts in this series covered numerous intricacies of DHCP relaying:
- DHCP relaying principles described the basics
- In Inter-VRFs relaying we figured out how a DHCP client reaches a DHCP server in another VRF without inter-VRF route leaking
- Relaying in VXLAN segments and relaying from EVPN VRF applied those lessons to VXLAN/EVPN environment.
- DHCP Relaying with Redundant DHCP Servers added relay- and server redundancy.
Now for the final bit of the puzzle: what if we want to do inter-VRF DHCP relaying with redundant DHCP servers?
Dealing with Cisco ACI Quirks
Sebastian described an interesting Cisco ACI quirk they had the privilege of chasing around:
We’ve encountered VM connectivity issues after VM movements from one vPC leaf pair to a different vPC leaf pair with ACI. The issue did not occur immediately (due to ACI’s bounce entries) and only sometimes, which made it very difficult to reproduce synthetically, but due to DRS and a large number of VMs it occurred frequently enough, that it was a serious problem for us.
Here’s what they figured out:
Simplify netlab Topologies with Link Groups
Last month I described how you can simplify your VLAN- or VRF lab topologies with VRF- and VLAN links, automatically setting vlan.access or vrf attribute on a set of links. Link groups allow you to do the same for any set of link attributes.
Sample Topology
Imagine you have a small network with three PE-routers connected to a central P-router:
Worth Reading: Trapped by Technology Fallacies
Michele Chubirka published a must-read article on technology fallacies including this gem:
Technologists often assume that all problems can be beaten into submission with a technology hammer.
As I’ve been saying for ages (not that anyone would listen): all the technology in the world won’t save you unless you change the mentality and rearchitect broken processes.
Why Is Source Address Validation Still a Problem?
I mentioned IP source address validation (SAV) as one of the MANRS-recommended actions in the Internet Routing Security webinar but did not go into any details (as the webinar deals with routing security, not data-plane security)… but I stumbled upon a wonderful companion article published by RIPE Labs: Why Is Source Address Validation Still a Problem?.
The article goes through the basics of SAV, best practices, and (most interesting) using free testing tools to detect non-compliant networks. Definitely worth reading!
Video: Types of Switching ASICs
Pete Lumbis concluded his ASICs for Networking Engineers presentation with a brief overview of types of switching ASICs and a wrap-up.
You can watch his entire 90-minute presentation (sliced into shorter videos) with Free ipSpace.net Subscription.
Find the Optimal Level of Automation Abstraction
Tom Ammon sent me his thoughts on choosing the right level of abstraction in your network automation solution as a response to my What Is Intent-Based Networking blog post, and allowed me to publish them on ipspace.net.
I totally agree with your what vs how example with OSPF. I work on a NOS team where if we wanted, we could say, instead of “run OSPF on these links”, do this:
New: Disaster Recovery Resources
I wrote dozens of blog posts debunking disaster recovery fairy tales (mostly of the long-distance vMotion and stretched clusters variety) over the years. They are collected and sorted (and polished a bit) in the new Disaster Recovery Resources page. Hope you’ll find them useful.
ITNOG 7 Wrap-up
I attended ITNOG 7 last week, and thoroughly enjoyed a full day of interesting presentations, including how do you run Internet services in a war zone by Elena Lutsenko and Milko Ilari.
The morning was focused primarily on BGP:
netlab Release 1.5.3: libvirt Public Networks
containerlab release 0.41.0 that came out a few days ago changed a few topology attributes with no backward compatibility, breaking netlab for anyone doing a new installation. The only way out of that conundrum was to push out a new netlab release that uses the new attributes and requires containerlab release 0.41.0 (more about that in a minute).
On a more positive note, netlab release 1.5.3 brings a few interesting features, including:
- Support for public libvirt networks that can be used to connect your labs to the outside world, and reuse of existing libvirt networks
- ‘unknown’ device type that can be used to deploy devices not yet supported by netlab
- MPLS/VPN support on Nokia SR-OS
Worth Reading: Official Ansible Collection for SR Linux
Roman Dodin wrote an article describing Nokia’s Ansible collection for SR Linux. Although I don’t use SR Linux (even though it was the first container supported by netlab ;), it was still very interesting to read about the design tradeoffs they had to make:
Service Insertion with BGP FlowSpec
Nicola Modena had an interesting presentation describing how you can use BGP FlowSpec for traffic steering and service insertion during the recent ITNOG 7 event (more about the event in a few days).
One of the slides explained how to use three different aspects of BGP (FlowSpec, MPLS/VPN and multipathing), prompting me to claim the presentation title should be “BGP is the answer, what was the question?” 😉 Hope you’ll enjoy the PDF version of the presentation as much as we did the live one.
Video: Kubernetes Container Networking Interface (CNI)
Ready for more Kubernetes details? How about Container Networking Interface (CNI) described by Stuart Charlton as part of Kubernetes Networking Deep Dive webinar?
Notes:
- You REALLY SHOULD watch Kubernetes SDN architecture and Sample Kubernetes SDN Implementations videos first
- The video (and a large portion of Kubernetes Networking Deep Dive webinar) is available with Free ipSpace.net Subscription.
MLAG Clusters without a Physical Peer Link
With the widespread deployment of Ethernet-over-something technologies, it became possible to build MLAG clusters without a physical peer link, replacing it with a virtual link across the core fabric. Avaya was one of the first vendors to implement virtual peer links with Provider Backbone Bridging (PBB) transport, and some data center switching vendors (example: Cisco) offer similar functionality with VXLAN transport.
Is ChatGPT an Efficiency Multiplier?
I got this comment on one of my ChatGPT-related posts:
It does save time for things like converting output to YAML (I do not feed it proprietary information), or have it write scripts in various languages, converting configs from one vendor to another, although often they are not complete or correct they save time so regardless of what we think of it, it is an efficiency multiplier.
I received similar feedback several times, but found that the real answer (as is too often the case) is It Depends.
Modifying BGP Behavior with xBGP API
When I reposted a link to xBGP: Faster Innovation in Routing Protocols paper, someone immediately replied
Quite interesting, but it feels like this could become the proverbial 15th standard.
xBGP is an API that allows BGP users to implement routing policies (route selection, filtering, or propagation) that use attributes or mechanisms defined in newer IETF RFCs or drafts, so the proverbial 15th standard is not that far off the mark. However, we must remember that what we call BGP is more than just a set of competing standards.
Building a DMVPN Test Lab with netlab
I always love to hear about real-life netlab use cases, and try to make them even easier to implement with new netlab features – that’s how netlab got custom Vagrant configuration templates and per-node configuration templates.
When Anne Baretta sent me his initial DMVPN solution, we quickly figured out we could make it even cleaner if netlab supported tunnel interfaces; you can enjoy the results in release 1.5.2, and explore Anne’s solution on GitHub.
MUST READ: End-to-End Arguments in System Design
In case you ever wondered how old the “keep network simple and do complex stuff at the endpoints” approach is, read the End-to-End Arguments in System Design article from 1981.
For whatever reason (hint: profits), networking vendors keep ignoring those arguments, turning the network into a kitchen sink of complexity.
Fun tidbit: the article describes a variant of relying on layer-2 checksums will corrupt your data. Some things never change.
Worth Reading: IPv6 Deployment Status
RFC 9386 documenting IPv6 deployment status in late 2022 has been published a few weeks ago1. It claims over a billion IPv6-capable users, and IPv6 deployment close to 50% in major countries.
Web content is a different story: while 40% of top-500 sites are IPv6-enabled, you can reach only ~20% of web sites over IPv6. Considering Cloudflare’s free proxying includes IPv6 that is enabled by default, that proves (once again) how slowly things change in IT.
Video: 400GbE Optics
When 400GbE was still an emerging technology, Mark Nowell explained its basics in an update session of the Data Center Fabric Architectures webinar, starting with 400GbE optics.
… updated on Friday, May 5, 2023 05:18 UTC
Silent Hosts in EVPN Fabrics
The Dynamic MAC Learning versus EVPN blog post triggered tons of interesting responses describing edge cases and vendor bugs implementation details, including an age-old case of silent hosts described by Nitzan:
Few years ago in EVPN network, I saw drops on the multicast queue (ingress replication goes to that queue). After analyzing it we found that the root cause is vMotion (the hosts in that VLAN are silent) which starts at a very high rate before the source leaf learns the destination MAC.
It turns out that the behavior they experienced was caused by a particularly slow EVPN implementation, so it’s not exactly the case of silent hosts, but let’s dig deeper into what could happen when you do have silent hosts attached to an EVPN fabric.
Small Site EBGP-Only Design
One of my subscribers found an unusual BGP specimen in the wild:
- It was a small site with two core switches and a WAN edge router
- The site had VPN concentrators running in virtual machines
- The WAN edge router was running BGP across WAN IPsec tunnels
- The VPN concentrators were running BGP with core switches.
So far so good, and kudos to whoever realized BGP is the only sane protocol to run between virtual machines and network core. However, the routing in the network core was implemented with EBGP sessions between the three core devices, and my subscriber thought the correct way to do it would be to use IBGP and OSPF.
netlab Release 1.5.2: Aruba CX, External Tools, Tunnel Interfaces
netlab release 1.5.2 brings another bunch of cool features, including:
- Aruba AOS-CX Support by Stefano Sasso
- External network management tools that you can start together with your lab
- Tunnel interfaces
- Reusable topology components
I’ll cover these features in separate blog posts; today I wanted to highlight a few minor additions: