Blog Posts in February 2022
Running a Ubuntu VM on a Mac M1
If you’re brand-new to Python and Ansible, you might be a bit reluctant to install a bunch of packages and Ansible collections on your production laptop to start building your automation skills. The usual recommendation I make to get past that hurdle is to create a Ubuntu virtual machine that can be destroyed every time to mess it up.
Creating a virtual machine is trivial on Linux and MacOS with Intel CPU (install VirtualBox and Vagrant). The same toolset no longer works on newer Macs with M1 CPU (VMware Fusion is in tech preview, so we’re getting there), but there’s an amazingly simple alternative: Multipass by Canonical.
Video: Use Cases for AI/ML in Networking
In the first half of the AI/ML in Networking webinar, Javier Antich walked us through the AI/ML hype, basics of machine learning, and machine learning techniques.
In the second part of the webinar, he described “The Good, The Bad and The Ugly”, starting with the good parts: where does AI/ML make sense in networking?
… updated on Monday, February 28, 2022 16:07 UTC
Cache-Based Packet Forwarding
In the previous blog post in this series I described how convoluted routing table lookups could become when you have to deal with numerous layers of indirection (BGP prefix ⇨ BGP next hop ⇨ IGP next hop ⇨ link bundle ⇨ outgoing interface). Modern high-end hardware can deal with the resulting complexity; decades ago we had to use router CPU to do multiple (potentially recursive) lookups in the IP routing table (there was no FIB at that time).
Network devices were always pushed to the bleeding edge of performance, and smart programmers always tried to optimize the CPU-intensive processes. One of the obvious packet forwarding optimizations relied on the fact that within a short timeframe most packets have to be forwarded to a small set of destinations. Welcome to the wonderful world of cache-based forwarding.
New netlab Installation Instructions
A long-time subscriber with a knack for telling me precisely why something I’m doing sucks big time sent me his opinion on netlab1 installation instructions:
I do not want to say it is impossible to follow your instruction but I wonder why the process is not clearly defined for someone not deeply involved in such tasks with full understanding of why to install from github, etc..
Many guys do not know if they want to use libvirt. They want to use the tool simple way without studying upfront what the libvirt is - but they see libvirt WARNING - should we install libvirt then or skip the installation?. But stop, this step of libvirt installation is obligatory in the 2nd Ubuntu section. So why the libvirt warning earlier?
I believe we should start really quickly to enjoy the tool before we reject it for “complexity”. Time To Play matters. Otherwise you are tired trying to understand the process before you check if this tool is right for you.
He was absolutely right – it was time to overhaul the “organically grown” installation instructions and make them goal-focused and structured. For those of you who want to see the big picture first, I also added numerous (hopefully helpful) diagrams. The new documentation is already online, and I’d love to hear your feedback. Thank you!
-
netlab was known as netsim-tools at that time. ↩︎
… updated on Thursday, February 3, 2022 07:33 UTC
ICMP Redirects Considered Harmful
One of my readers sent me an intriguing challenge based on the following design:
- He has a data center with two core switches (C1 and C2) and two Cisco Nexus edge switches (E1 and E2).
- He’s using static default routing from core to edge switches with HSRP on the edge switches.
- E1 is the active HSRP gateway connected to the primary WAN link.
The following picture shows the simplified network diagram:
Feedback: DMVPN Webinars
Some webinars on ipSpace.net are ancient (= more than a decade old). I’m refreshing some of them (the overhaul of Introduction to Virtualized Networking was completed earlier this month); others will stay as they are because the technology hasn’t changed in a long while, and it’s always nice to hear someone still finds them useful. This is a recent feedback I got on the DMVPN webinars:
As with any other webinar I have viewed on ipspace.net, this one provides the background as to why you may or may not want to do certain things and what impact that may have (positive or negative) on your network. Then it digs into the how of actually doing something. Brilliant content as always.
IPSpace.net is my go-to for deep dives on existing and emerging technologies in the networking industry. No unnecessary preamble. Gets straight to the point of why you are looking at a specific technology and explains the what and the why before getting into the how.
Using netlab with containerlab: Welcome to the World of Tomorrow
Julio Perez wrote a wonderful blog post describing how he combined netlab and containerlab1 to build Arista cEOS labs.
Hint: when you’re done with that blog post, keep reading and add his blog to your RSS feed – he wrote some great stuff in the past.
-
netlab was known as netsim-tools at the time he wrote the blog post ↩︎
Worth Reading: Performance Testing of Commercial BGP Stacks
For whatever reason, most IT vendors attach “you cannot use this for performance testing and/or publish any results” caveat to their licensing agreements, so it’s really hard to get any independent test results that are not vendor-sponsored and thus suitably biased.
Justin Pietsch managed to get a permission to publish test results of Junos container implementation (cRPD) – no surprise there, Junos outperformed all open-source implementations Justin tested in the past.
What about other commercial BGP stacks? Justin did the best he could: he published Testing Commercial BGP Stacks instructions, so you can do the measurements on your own.
netsim-tools (now netlab) on the Modem Podcast
A few weeks ago, Nick Buraglio and Chris Cummings invited me for an hour-long chat about netlab on the Modem Podcast1.
We talked about why one might want to use netlab instead of another lab orchestration solution and the high-level functionality offered by the tool. Nick particularly loved its IPAM features which got so extensive in the meantime that I had to write a full-blown addressing tutorial. But there’s so much more: you can also get a fully configured OSPFv2, OSPFv3, EIGRP, IS-IS, SRv6, or BGP lab built from more than a dozen different devices. In short (as Nick and Chris said): you can use netlab to make labbing less miserable.
-
netlab was known as netsim-tools when we were recording that podcast. ↩︎
… updated on Saturday, February 19, 2022 07:43 UTC
The Impact of Jumbo Maximum Frame Size on Data Center Switches
Sander Steffann sent me an intriguing question a long while ago:
I was wondering if there are any downsides to setting “system mtu jumbo 9198” by default on every switch? I mean, if all connected devices have MTU 1500 they won’t notice that the switch could support longer frames, right?
That’s absolutely correct, and unless the end hosts get into UDP fights things will always work out (aka TCP MSS saves the day)… but there must be a reason switching vendors don’t use maximum frame sizes larger than 1514 by default (Cumulus Linux seems to be an exception, and according to Sébastien Keller Arista’s default maximum frame size is between 9214 and 10178 depending on the platform).
Running BGP between Virtual Machines and Data Center Fabric
Got this question from one of my readers:
When adopting the BGP on the VM model (say, a Kubernetes worker node on top of vSphere or KVM or Openstack), how do you deal with VM migration to another host (same data center, of course) for maintenance purposes? Do you keep peering with the old ToR even after the migration, or do you use some BGP trickery to allow the VM to peer with whatever ToR it’s closest to?
Short answer: you don’t.
Kubernetes was designed in a way that made worker nodes expendable. The Kubernetes cluster (and all properly designed applications) should recover automatically after a worker node restart. From the purely academic perspective, there’s no reason to migrate VMs running Kubernetes.
Feedback: Cisco ACI Webinars
Antonio Boj enjoyed the Cisco ACI webinars by Mario Rosi and sent me this feedback:
I just wanted to pass you my feedback about the documentation and content of the above webinars. Excellent content, very well organized.
My expectation is always high about your content because I’ve become used to it with other webinars you published. I always look for non-marketing content to understand the technology.
I don’t want to criticize vendors based on assumptions or personal agendas from interested people but evaluate whether or not it is the right path forward for the problem I want to solve, knowing the pros and cons. So again, both webinars about Cisco ACI have given me excellent visibility of the solution. Thank you very much!
Packet Forwarding 101: Header Lookups
Whenever someone asks me about LISP, I answer, “it’s a nice idea, but cache-based forwarding never worked well.” Oldtimers familiar with the spectacular failures of fast switching and various incarnations of flow switching usually need no further explanation. Unfortunately, that lore is quickly dying out, so let’s start with the fundamentals: how does packet forwarding work?
Packet forwarding used by bridges and routers (or Layer-2/3 switches if you believe in marketing terminology) is just a particular case of statistical multiplexing – a mechanism where many communication streams share the network resources by slicing the data into packets that are sent across the network. The packets are usually forwarded independently; every one of them must contain enough information to be propagated by each intermediate device it encounters on its way across the network.
Worth Reading: End-to-end Congestion Control Cannot Avoid Latency Spikes
Found a pointer to another you cannot beat the laws of physics or networking result: you cannot avoid latency spikes with end-to-end congestion control regardless of the amount of unicorn dust or hype you’re throwing at the problem (original paper).
Worth Reading: Crazy about VMware SD-WAN
Have to work with VMware SD-WAN (the entity formerly known as VeloCloud)? You might find interesting tidbits in Crazy about VMware SD-WAN by Alexander Marhold.
Video: Network Layer Addressing
After a brief excursion into the ancient data link layer addressing ideas (that you can still find in numerous systems today) and LAN addressing it’s time to focus on network-layer addressing, starting with “can we design protocols without network-layer addresses” (unfortunately, YES) and “should a network-layer address be tied to a node or to an interface” (as always, it depends).
For more details, watch the Network Layer Addressing video (part of How Networks Really Work webinar).
… updated on Tuesday, February 15, 2022 14:37 UTC
Build Vagrant Boxes for Your Network Devices
One of the toughest hurdles to overcome when building your own virtual networking lab is the slog of downloading VM images for your favorite network devices and building Vagrant boxes1 in case you want to use them with Vagrant or netlab.
You can find box-building recipes on the Internet – codingpackets.com has a dozen of them – but they tend to be a bit convoluted and a smidge hard-to-follow the first time you’re trying to build the boxes (trust me, I’ve been there).
OMG: VTP Is Insecure
One of my readers sent me an interesting pointer:
I just watched a YouTube video by a security researcher showing how a five line python script can be used to unilaterally configure a Cisco switch port connected to a host computer into a trunk port. It does this by forging a single virtual trunk protocol (VTP) packet. The host can then eavesdrop on broadcast traffic on all VLANs on the network, as well as prosecute man-in-the-middle of attacks.
I’d say that’s a “startling revelation” along the lines of “OMG, VXLAN is insecure” – a wonderful way for a security researcher to gain instant visibility. From a more pragmatic perspective, if you enable an insecure protocol on a user-facing port, you get the results you deserve1.
While I could end this blog post with the above flippant remark, it’s more fun considering two fundamental questions.
Mixed Feelings about BGP Route Reflector Cluster ID
Here’s another BGP Route Reflector myth:
In a redundant design, you should use Route Reflector Cluster ID to avoid loops.
TL&DR: No.
While BGP route reflectors can cause permanent forwarding loops in sufficiently broken topologies, the Cluster ID was never needed to stop a routing update propagation loop:
Feedback: ipSpace.net Materials
Andy Lemin sent me such a wonderful review of ipSpace.net materials that I simply couldn’t resist publishing it ;)
ipSpace.net is probably my favorite networking resource out there. After spending years with other training content sites which are geared around certifications, ipspace.net provides a totally unique source of vendor neutral opinions, information, and anecdotes – the kind of information that is just not available anywhere else. And to top it off, is presented by a wonderful speaker who is passionate, smart and really knows his stuff!
The difference between an engineer who just has certs versus an engineer who has a rounded and wide view of the whole industry is massive. An engineer with certs can configure your network, but an engineer with all the knowledge this site provides, is someone who can question why and challenge how we can configure your network in a better way.
Worth Reading: We're a Decade Past Blade Server Market Peak
Stumbled upon a totally unexpected fun fact:
Every server vendor either peaked or hits the peak of maximum units sold per quarter in 2015. In the years that follow, the monthly averages drop.
Keep that in mind the next time Cisco sales team comes along with a UCS presentation.
Worth Reading: Non-Standard Standards, SRv6 Edition
Years ago, I compared EVPN to SIP – it has a gazillion options, and every vendor implements a different subset of them, making interoperability a nightmare.
According to Andrew Alston, SRv6 is no better (while being a security nightmare). No surprise there.
Lesson Learned: The Way Forward
I tried to wrap up my Lessons Learned presentation on a positive note: what are some of the things you can do to avoid all the traps and pitfalls I encountered in the almost four decades of working in networking industry:
- Get invited to architecture and design meetings when a new application project starts.
- Always try to figure out what the underlying actual business needs are.
- Just because you can doesn’t mean that you should.
- Keep it as simple as possible, but no simpler.
- Work with your peers and explain how networking works and why you face certain limitations.
- Humans are not perfect – automate as much as it makes sense, but no more.
Do a Cleanup Before Automating Your Network
Remington Loose sent me an interesting email describing his views on the right approach to network automation after reading my Network Reliability Engineering Should Be More than Software or Automation rant – he’s advocating standardizing network services and cleaning up your network before trying to deploy full-scale automation.
I think you are 100% right to start with a thorough cleanup before automation. Garbage in, garbage out. It is also the case that all that inconsistency and differentiation makes for complexity in automation (as well as general operations) that makes it harder to gain traction.
netsim-tools Release 1.1.2
Every time I’m writing netsim-tools release notes I’m amazed at the number of features we managed to put together in just a few weeks.
Here are the goodies from netsim-tools releases 1.1.1 and 1.1.2:
BGP Route Reflector Myths
New networking myths are continuously popping up. Here’s a BGP one I encountered a few days ago:
You don’t need IBGP sessions between BGP route reflectors
In general, that’s clearly wrong, as illustrated by this setup: