Category: design
Do We Need LFA or FRR for Fast Failover in ECMP Designs?
One of my readers sent me a question along these lines:
Imagine you have a router with four equal-cost paths to prefix X, two toward upstream-1 and two toward upstream-2. Now let’s suppose that one of those links goes down and you want to have link protection. Do I really need Loop-Free Alternate (LFA) or MPLS Fast Reroute (FRR) to get fast (= immediate) failover or could I rely on multiple equal-cost paths to get the job done? I’m getting different answers from different vendors…
Please note that we’re talking about a very specific question of whether in scenarios with equal-cost layer-3 paths the hardware forwarding data structures get adjusted automatically on link failure (without CPU reprogramming them), and whether LFA needs to be configured to make the adjustment happen.
Worth Reading: Does your hammer own you?
My friend Marjan Bradeško wrote a great article describing how we tend to forget common sense and rely too much on technology. I would strongly recommend you read it and start thinking about the choices you make when building a network with magic software-intent-defined-intelligent technology from your preferred vendor.
Video: Simplify Device Configurations with Cumulus Linux
The designers of Cumulus Linux CLI were always focused on simplifying network device configurations. One of the first features along these lines was BGP across unnumbered interfaces, then they introduced simplified EVPN configurations, and recently auto-MLAG and auto-BGP.
You can watch a short description of these features by Dinesh Dutt and Pete Lumbis in Simplify Network Configuration with Cumulus Linux and Smart Datacenter Defaults videos (part of Cumulus Linux section of Data Center Fabrics webinar).
Fixing Firewall Ruleset Problem For Good
Before we start: if you’re new to my blog (or stumbled upon this blog post by incident) you might want to read the Considerations for Host-Based Firewalls for a brief overview of the challenge, and my explanation why flow-tracking tools cannot be used to auto-generate firewall policies.
As expected, the “you cannot do it” post on LinkedIn generated numerous comments, ranging from good ideas to borderline ridiculous attempts to fix a problem that has been proven to be unfixable (see also: perpetual motion).
Must Watch: Fault Tolerance through Optimal Workload Placement
While I keep telling you that Google-sized solutions aren’t necessarily the best fit for your environment, some of the hyperscaler presentations contain nuggets that apply to any environment no matter how small it is.
One of those must-watch presentations is Fault Tolerance through Optimal Workload Placement together with a wonderful TL&DR summary by the one-and-only Todd Hoff of the High Scalability fame.
Are Business Needs Just Excuses for Vendor Shenanigans?
Every now and then I call someone’s baby ugly (or maybe it was their third cousin’s baby and they nonetheless feel offended). In such cases a common resort is to cite business or market needs to prove how ignorant and clueless I am. Here’s a sample LinkedIn comment talking about my ignorance about the need for smart NICs:
The rise of custom silicon by Presando [sic], Mellanox, Amazon, Intel and others confirms there is a real market need.
Now let’s get something straight: while there are good reasons to use tons of different things that might look inappropriate, irrelevant or plain stupid to an outsider, I don’t believe in real market need argument being used to justify anything without supporting technical facts (tell me why you need that stuff and prove to me that using it is the best way of solving a problem).
Is Cisco ACI Too Different?
A friend of mine involved in multiple Cisco ACI installations sent me this comment on their tenant connectivity model:
I’m a bit allergic to ACI. The abstraction is mis-aligned with familiar configurations, in particular contracts being independent of and over-riding routing, tenants, etc. You can really make a mess with that, and I’ve seen some! One needs to impose some structure, naming conventions…, and most people don’t seem to get that done.
As I noticed in the NSX-or-ACI webinar, it’s interesting how NSX decided to stay with the familiar VLAN/routing/filtering paradigm (more details), whereas the designers of Cisco ACI decided to go down a totally different path.
Networking, Engineering and Safety
You might remember my occasional rants about lack of engineering in networking. A long while ago David Barroso nicely summarized the situation in a tweet responding to my BGP and Car Safety blog post:
If we were in a proper engineering we’d be discussing how to regulate and add safeties to an important tech that is unsafe and hard to operate. Instead, we blog about how to do crazy shit to it or how it’s a hot mess. Let’s be honest, if BGP was a car it’d be one pulled by horses.
Redundant Server Connectivity in Layer-3-Only Fabrics (Part 2)
In June 2020 I published the first part of Redundant Server Connectivity in Layer-3-Only Fabrics article describing the target design and application-layer requirements.
During the summer I added the details of multi-subnet server and client connectivity and a few conclusions.
MUST READ: What I've learned about scaling OSPF in Datacenters
Justin Pietsch published a fantastic recap of his experience running OSPF in AWS infrastructure. You MUST read what he wrote, here’s the TL&DR summary:
- Contrary to popular myths, OSPF works well on very large leaf-and-spine networks.
- OSPF nuances are really hard to grasp intuitively, and the only way to know what will happen is to run tests with the same codebase you plan to use in a production environment.
Dinesh Dutt made similar claims on one of our podcasts, and I wrote numerous blog posts on the same topic. Not that anyone would care or listen; it’s so much better to watch vendor slide decks full of the latest unicorn dust… but in the end, it’s usually not the protocol that’s broken, but the network design.
Worth Reading: Seamless Suffering
When someone sent me a presentation on seamless MPLS a long while ago my head (almost) exploded just by looking at the diagrams… or in the immortal words of @amyengineer:
“If it requires a very solid CCIE on an obscure protocol mix at 4am, it is a bad design” - Peter Welcher, genius crafter of networks, granter of sage advice.
Turns out I was not that far off… Dmytro Shypovalov documented the underlying complexity and a few things that can go wrong in Seamless Suffering.
Adapting Network Design to Support Automation
This blog post was initially sent to the subscribers of my SDN and Network Automation mailing list. Subscribe here.
Adam left a thoughtful comment addressing numerous interesting aspects of network design in the era of booming automation hype on my How Should Network Architects Deal with Network Automation blog post. He started with:
A question I keep tasking myself with addressing but never finding the best answer, is how appropriate is it to reform a network environment into a flattened design such as spine-and-leaf, if that reform is with the sole intent and purpose to enable automation?
A few basic facts first:
Bridging Loops in Disaster Recovery Designs
One of the readers commenting the ideas in my Disaster Recovery and Failure Domains blog post effectively said “In an active/passive DR scenario, having L3 DCI separation doesn’t protect you from STP loop/flood in your active DC, so why do you care?”
He’s absolutely right - if you have a cold disaster recovery site, it doesn’t matter if it’s bombarded by a gazillion flooded packets per second… but how often do you have a cold recovery site?
Redundant Server Connectivity in Layer-3-Only Fabrics
A long while ago I decided to write an article explaining how you could run VMware NSX on ESXi servers with redundant connections to two top-of-rack switches on top of a layer-3-only fabric (a fabric with IP subnets and VLANs limited to a single top-of-rack switch). Turns out that’s Mission Impossible, so I put the article on the back burner and slowly forgot about it.
Well, not exactly. Every now and then my subconsciousness would kick it up and I’d figure out yet-another reason why it’s REALLY hard to do it right. After a while, I decided to try again, and completely rewrote the article. The first part is already online, more details coming (hopefully) soon.
BGP AS Numbers on MLAG Members
I got this question about the use of AS numbers on data center leaf switches participating in an MLAG cluster:
In the Leaf-and-Spine Fabric Architectures you made the recommendation to have the same AS number on all members of an MLAG cluster and run iBGP between them. In the Autonomous Systems and AS Numbers article you discuss the option of having different AS number per leaf. Which one should I use… and do I still need the EBGP peering between the leaf pair?
As always, there’s a bit of a gap between theory and practice ;), but let’s start with a leaf-and-spine fabric diagram illustrating both concepts: