One of my readers sent me this interesting question:
I understand that an SDN controller needs network topology information to build traffic engineering paths with PCE/PCEP… but why would we use BGP-LS to extract the network topology information? Why can’t we run OSPF with controller by simulating a software based OSPF instance in every area to get topology view?
There are several reasons to use BGP-LS:
A while ago Antti Leimio wrote a long twitter thread describing his frustrations with Cisco ACI object model. I asked him for permission to repost the whole thread as those things tend to get lost, and he graciously allowed me to do it, so here we go.
I took a 5 days Cisco DCACI course. This is all new to me. I’m confused. Who is ACI for? Capabilities and completeness of features is fantastic but how to manage this complex system?
Every now and then I get a question along the lines of “why can’t we have a distributed SDN controller (because resiliency) that would survive network partitioning?” This time, it’s not the incompetency of solution architects or programmers, but the fundamental limitations of what can be done when you want to have consistent state across a distributed system.
TL&DR: If your first thought was CAP Theorem you’re absolutely right. You can probably stop reading right now. If you have no idea what I’m talking about, maybe it’s time you get fluent in distributed systems concepts after you’re finished with this blog post and all the reference material linked in it. Don’t know where to start? I put together a list of resources I found useful.
When I’d first seen BGP-LS I immediately thought: “it would be cool to use this to fetch link state topology data from the network and build a graph out of it”. In those days the only open-source way I could find to do it involved Open DayLight controller’s BGP-LS-to-REST-API converter, and that felt like deploying an aircraft carrier to fly a kite.
Things have improved dramatically since then. In Visualizing BGP-LS Tables, HB described how he solved the challenge with GoBGP, gRPC interface to GoBGP, and some Python code to parse the data and draw the topology graph with NetworkX. Enjoy!
A long-time reader sent me a series of questions about the impact of WAN partitioning in case of an SDN-based network spanning multiple locations after watching the Architectures part of Data Center Fabrics webinar. He therefore focused on the specific case of centralized control plane (read: an equivalent of a stackable switch) with distributed controller cluster (read: switch stack spread across multiple locations).
This podcast introduction was written by Nick Buraglio, the host of today’s podcast.
In the original days of this podcast, there were heavy, deep discussions about this new protocol called “OpenFlow”. Like many of our most creative innovations in the IT field, OpenFlow came from an academic research project that aimed to change the way that we as operators managed, configured, and even thought about networking fundamentals.
For the most part, this project did what it intended, but once the marketing machine realized the flexibility of the technology and its potential to completely change the way we think about vendors, networks, provisioning, and management of networking, they were off to the races.
We all know what happened next.
Michael Mullany analyzed 20 years of Gartner hype cycles and got some (expected but still interesting) conclusions including:
- Nobody noticed major technologies even when they were becoming mainstream
- Lots of technologies just die, others make progress when nobody is looking
- We might get the idea right and fail badly at implementation
- It takes a lot longer to solve some problems than anyone expected
Enjoy the reading, and keep these lessons in mind the next time you’ll be sitting in a software-defined, intent-based or machine-learning $vendor presentation.
Loved the article from Philip Laplante about environmental antipatterns. I’ve seen plenty of founderitis and shoeless children in my life, but it was worshipping the golden calf that made me LOL:
In any environment where there is poor vision or leadership, it is often convenient to lay one’s hopes on a technology or a methodology about which little is known, thereby providing a hope for some miracle. Since no one really understands the technology, methodology, or practice, it is difficult to dismiss. This is an environmental antipattern because it is based on a collective suspension of disbelief and greed, which couldn’t be sustained by one or a few individuals embracing the ridiculous.
That paragraph totally describes the belief in the magical powers of long-distance vMotion, SDN (I published a whole book debunking its magical powers), building networks like Google does it, intent-based whatever, machine learning…
A while ago we discussed a software-focused view of Network Interface Cards (NICs) with Luke Gorrie, and a hardware-focused view of them with Or Gerlitz (Mellanox), Andy Gospodarek (Broadcom) and Jiri Pirko (Mellanox).
Why would anyone want to implement features in hardware and not in software, and what would be the best hardware implementation? We discussed these dilemmas with Silvano Gai in Episode 110 of Software Gone Wild podcast.
Over the last weekend I almost got pulled into yet-another CLI-or-automation Twitter spat. The really sad part: I thought we were past that point. After all, I’ve been ranting about that topic for almost seven years… and yet I’m still hearing the same arguments I did in those days.
Just for the giggles I collected a few old blog posts on the topic (not that anyone evangelizing their opinions on Twitter would ever take the time to read them ;).
AI is the new SDN, and we’re constantly bombarded with networking vendor announcements promising AI-induced nirvana, from reinventing Clippy to automatic anomaly- and threat identifications.
If you still think these claims are realistic, it’s time you start reading what people involved in AI/ML have to say about hype in their field. I posted a few links in the past, and the Packet Pushers Human Infrastructure magazine delivered another goodie into my Inbox.
The last Software Gone Wild podcast recorded in 2019 focused on advances in Linux networking - in particular on interesting stuff presented at NetDev 0x13 conference in Prague. The guests (in alphabetical first name order) Jamal Hadi Salim, Shrijeet Mukherjee, Sowmini Varadhan, and Tom Herbert shared their favorite topics, and commented on the future of Linux networking.
I stumbled upon a great MIT Technology Review article (warning: regwall ahead) with a checklist you SHOULD use whenever considering a machine-learning-based product.
While the article focuses on machine learning at least some of the steps in that list apply to any new product that claims to use a brand new technology in a particular problem domain like overlay virtual networking with blockchain:
Someone sent me this observation after reading my You Cannot Have Public Cloud without Networking blog post:
As much as I sympathize with your view, scales matter. And if you make ATMs that deal with all the massive client population, the number of bank tellers needed will go down. A lot.
Based on what I read a while ago a really interesting thing happened in financial industry: while the number of tellers went down, number of front-end bank employees did not go down nearly as dramatically, they just turned into “consultants”.
One of my subscribers was interested in trying out whitebox solutions. He wrote:
What open source/whitebox software/hardware should I look at if I wanted to build a leaf-and-spine VXLAN/EVPN/BGP data center.
I don’t think you can get a fully-open-source solution because the ASIC manufacturers hide their SDK behind a mountain of NDAs (that strategy must make perfect sense – after all, it generated such awesome PR for NVIDIA). Anyway, the closest you can get (AFAIK) if you're a mere mortal is Cumulus Linux, and you just choose any whitebox hardware off their Hardware Compatibility List.