Every now and then I get a question along the lines of “why can’t we have a distributed SDN controller (because resiliency) that would survive network partitioning?” This time, it’s not the incompetency of solution architects or programmers, but the fundamental limitations of what can be done when you want to have consistent state across a distributed system.
TL&DR: If your first thought was CAP Theorem you’re absolutely right. You can probably stop reading right now. If you have no idea what I’m talking about, maybe it’s time you get fluent in distributed systems concepts after you’re finished with this blog post and all the reference material linked in it. Don’t know where to start? I put together a list of resources I found useful.
More than a year ago I was enjoying a cool beer with my friend Nicola Modena who started explaining how he solved the “you don’t need IP address renumbering for disaster recovery” conundrum with production and standby VRFs. All it takes to flip the two is a few changes in import/export route targets.
I asked Nicola to write about his design, but he’s too busy doing useful stuff. Fortunately he’s not the only one using common sense approach to disaster recovery designs (as opposed to flat earth vendor marketectures). Adrian Giacometti used a very similar design with one of his customers and documented it in a blog post.
A while ago we had an interesting exchange of ideas around inserting high-availability network appliance into a public cloud environment (TL&DR: it was really hard until AWS introduced Gateway Load Balancing), and someone quickly pointed out we’re solving the wrong challenge because…
Azure Firewall […] is a fully stateful firewall-as-a-service with built-in high-availability.
Somehow he wasn’t too happy when I pointed out that there’s more to high availability than vendor marketing ;)
One of my readers sent me a question along these lines:
Imagine you have a router with four equal-cost paths to prefix X, two toward upstream-1 and two toward upstream-2. Now let’s suppose that one of those links goes down and you want to have link protection. Do I really need Loop-Free Alternate (LFA) or MPLS Fast Reroute (FRR) to get fast (= immediate) failover or could I rely on multiple equal-cost paths to get the job done? I’m getting different answers from different vendors…
Please note that we’re talking about a very specific question of whether in scenarios with equal-cost layer-3 paths the hardware forwarding data structures get adjusted automatically on link failure (without CPU reprogramming them), and whether LFA needs to be configured to make the adjustment happen.
My friend Marjan Bradeško wrote a great article describing how we tend to forget common sense and rely too much on technology. I would strongly recommend you read it and start thinking about the choices you make when building a network with magic software-intent-defined-intelligent technology from your preferred vendor.
The designers of Cumulus Linux CLI were always focused on simplifying network device configurations. One of the first features along these lines was BGP across unnumbered interfaces, then they introduced simplified EVPN configurations, and recently auto-MLAG and auto-BGP.
You can watch a short description of these features by Dinesh Dutt and Pete Lumbis in Simplify Network Configuration with Cumulus Linux and Smart Datacenter Defaults videos (part of Cumulus Linux section of Data Center Fabrics webinar).
Before we start: if you’re new to my blog (or stumbled upon this blog post by incident) you might want to read the Considerations for Host-Based Firewalls for a brief overview of the challenge, and my explanation why flow-tracking tools cannot be used to auto-generate firewall policies.
As expected, the “you cannot do it” post on LinkedIn generated numerous comments, ranging from good ideas to borderline ridiculous attempts to fix a problem that has been proven to be unfixable (see also: perpetual motion).
While I keep telling you that Google-sized solutions aren’t necessarily the best fit for your environment, some of the hyperscaler presentations contain nuggets that apply to any environment no matter how small it is.
One of those must-watch presentations is Fault Tolerance through Optimal Workload Placement together with a wonderful TL&DR summary by the one-and-only Todd Hoff of the High Scalability fame.
Every now and then I call someone’s baby ugly (or maybe it was their third cousin’s baby and they nonetheless feel offended). In such cases a common resort is to cite business or market needs to prove how ignorant and clueless I am. Here’s a sample LinkedIn comment talking about my ignorance about the need for smart NICs:
The rise of custom silicon by Presando [sic], Mellanox, Amazon, Intel and others confirms there is a real market need.
Now let’s get something straight: while there are good reasons to use tons of different things that might look inappropriate, irrelevant or plain stupid to an outsider, I don’t believe in real market need argument being used to justify anything without supporting technical facts (tell me why you need that stuff and prove to me that using it is the best way of solving a problem).
A friend of mine involved in multiple Cisco ACI installations sent me this comment on their tenant connectivity model:
I’m a bit allergic to ACI. The abstraction is mis-aligned with familiar configurations, in particular contracts being independent of and over-riding routing, tenants, etc. You can really make a mess with that, and I’ve seen some! One needs to impose some structure, naming conventions…, and most people don’t seem to get that done.
As I noticed in the NSX-or-ACI webinar, it’s interesting how NSX decided to stay with the familiar VLAN/routing/filtering paradigm (more details), whereas the designers of Cisco ACI decided to go down a totally different path.
You might remember my occasional rants about lack of engineering in networking. A long while ago David Barroso nicely summarized the situation in a tweet responding to my BGP and Car Safety blog post:
If we were in a proper engineering we’d be discussing how to regulate and add safeties to an important tech that is unsafe and hard to operate. Instead, we blog about how to do crazy shit to it or how it’s a hot mess. Let’s be honest, if BGP was a car it’d be one pulled by horses.
In June 2020 I published the first part of Redundant Server Connectivity in Layer-3-Only Fabrics article describing the target design and application-layer requirements.
Justin Pietsch published a fantastic recap of his experience running OSPF in AWS infrastructure. You MUST read what he wrote, here’s the TL&DR summary:
- Contrary to popular myths, OSPF works well on very large leaf-and-spine networks.
- OSPF nuances are really hard to grasp intuitively, and the only way to know what will happen is to run tests with the same codebase you plan to use in production environment.
Dinesh Dutt made similar claims on one of our podcasts, and I wrote numerous blog posts on the same topic. Not that anyone would care or listen, it’s so much better to watch vendor slide decks full of latest unicorn dust… but in the end, it’s usually not the protocol that’s broken, but the network design.
When someone sent me a presentation on seamless MPLS a long while ago my head (almost) exploded just by looking at the diagrams… or in the immortal words of @amyengineer:
“If it requires a very solid CCIE on an obscure protocol mix at 4am, it is a bad design” - Peter Welcher, genius crafter of networks, granter of sage advice.
This blog post was initially sent to the subscribers of my SDN and Network Automation mailing list. Subscribe here.
Adam left a thoughtful comment addressing numerous interesting aspects of network design in the era of booming automation hype on my How Should Network Architects Deal with Network Automation blog post. He started with:
A question I keep tasking myself with addressing but never finding the best answer, is how appropriate is it to reform a network environment into a flattened design such as spine-and-leaf, if that reform is with the sole intent and purpose to enable automation?
A few basic facts first: