Category: worth reading
Worth Reading: Cloudflare Control Plane Outage
Cloudflare experienced a significant outage in early November 2023 and published a detailed post-mortem report. You should read the whole report; here are my CliffsNotes:
- Regardless of how much redundancy you have, sometimes all systems will fail at once. Having redundant systems decreases the probability of total failure but does not reduce it to zero.
- As your systems grow, they gather hidden- and circular dependencies.
- You won’t uncover those dependencies unless you run a full-blown disaster recovery test (not a fake one)
- If you don’t test your disaster recovery plan, it probably won’t work when needed.
Also (unrelated to Cloudflare outage):
Git Rebase: What Can Go Wrong?
Julia Evans wrote another must-read article (if you’re using Git): git rebase: what can go wrong?
I often use git rebase to clean up the commit history of a branch I want to merge into a main branch or to prepare a feature branch for a pull request. I don’t want to run it unattended – I’m always using the interactive option – but even then, I might get into tight spots where I can only hope the results will turn out to be what I expect them to be. Always have a backup – be it another branch or a copy of the branch you’re working on in a remote repository.
Worth Reading: Confusing Git Terminology
Julia Evans wrote another great article explaining confusing git terminology. Definitely worth reading if you want to move past simple recipes or reminiscing about old days.
Worth Reading: Taming the BGP Reconfiguration Transients
Almost exactly a decade ago I wrote about a paper describing how IBGP migrations can cause forwarding loops and how one could reorder BGP reconfiguration steps to avoid them.
One of the paper’s authors was Laurent Vanbever who moved to ETH Zurich in the meantime where his group keeps producing great work, including the Chameleon tool (code on GitHub) that can tame transient loops while reconfiguring BGP. Definitely something worth looking at if you’re running a large BGP network.
Worth Exploring: BGP from Theory to Practice
My good friend Tiziano Tofoni finally created an English version of his evergreen classic BGP from theory to practice with co-authors Antonio Prado and Flavio Luciani.
I had the Italian version of the book since the days I was running SDN workshops with Tiziano in Rome, and it’s really nice to see they finally decided to address a wider market.
Also, you know what would go well with that book? Free open-source BGP configuration labs of course 😉
How GitHub Saved My Day
I always tell networking engineers who aspire to be more than VLAN-munging CLI jockeys to get fluent with Git. I should also be telling them that while doing local version control is the right thing to do, you should always have backups (in this case, a remote repository).
I’m eating my own dog food1 – I’m using a half dozen Git repositories in ipSpace.net production2. If they break, my blog stops working, and I cannot publish new documents3.
Now for a fun fact: Git is not transactionally consistent.
Worth Reading: Where Are the Self-Driving Cars?
Gary Marcus wrote an interesting essay describing the failure of self-driving cars to face the unknown unknowns. The following gem from his conclusions applies to AI in general:
In a different world, less driven by money, and more by a desire to build AI that we could trust, we might pause and ask a very specific question: have we discovered the right technology to address edge cases that pervade our messy really world? And if we haven’t, shouldn’t we stop hammering a square peg into a round hole, and shift our focus towards developing new methodologies for coping with the endless array of edge cases?
Obviously that’s not going to happen, we’ll keep throwing more GPU power at the problem trying to solve it by brute force.
Case Study: BGP Routing Policy
Talking about BGP routing policy mechanisms is nice, but it’s even better to see how real Internet Service Providers use those tools to implement real-life BGP routing policy.
Getting that information is incredibly hard as everyone considers their setup a secret sauce. Fortunately, there are a few exceptions; Pim van Pelt described the BGP Routing Policy of IPng Networks in great details. The article is even more interesting as he’s using Bird2 configuration language that looks almost like a programming language (as compared to the ancient route-maps used by vendors focused on “industry-standard” CLI).
Have fun!
Lifetime ipSpace.net Subscription
More than thirteen years after I started creating vendor-neutral webinars, it’s time for another change1: the ipSpace.net subscriptions became perpetual. If you have an active ipSpace.net subscription, it will stay valid indefinitely2 (and I’ll stop nagging you with renewal notices).
Wow, Free Lunch?
Sadly, that’s not the case.
Worth Reading: Looking Inside Large Language Models
Bruce Davie published an interesting overview article about Large Language Models. It would be worth reading just for the copious links to in-depth article; I particularly like his conclusions:
We mistake performance (producing realistic text) for competence (understanding the world).
Having a model for language is different from having a model of the world.
And that’s a perfect explanation why it makes no sense to expect ChatGPT and friends to produce picture-perfect device configurations or always-working code.
How GitHub Learned How Hard Distributed Systems Are
Anne Baretta found a great video describing the October 2018 GitHub failure. Here’s the TL&DW:
- The failure was caused by a short (~ 1 minute) disconnect of the primary data center
- The database replicas failed over to the secondary data center, but that failover was never tested and of course some stuff didn’t work.
- In the meantime, batch jobs modified data in the primary data center, making the two replicas out-of-sync.
- It took them over 24 hours to clean up the mess.
Engagement Farming
One of my readers asked for my opinion about the following masterpiece posted on (where else) LinkedIn1:
Getting Comfortable with the Command Line
More than a dozen years after the SDN brouhaha erupted, some people still haven’t got the memo on the obsolescence of CLI. For example, Julia Evans tries to make people comfortable with the command line. Has nobody told her it’s like teaching COBOL?
On a more serious note: you OUGHT TO master Linux CLI and be comfortable using CLI commands on network devices and servers. Her article has tons of useful tips and is definitely worth reading.
Worth Reading: Networking for AI Workloads
Sharada Yeluri (Senior Director of Engineering at Juniper Networks) wrote a long article describing the connectivity requirements of AI workloads and new approaches to Ethernet fabrics. Definitely worth reading if you’re interested in these topics.
Worth Reading: Eyes Like Saucers
Gerben Wierda published a nice description of common reactions to new unicorn-dust-based technologies:
- Eyes that glaze over
- Eyes like saucers
- Eyes that narrow
He uses generative AI as an example to explain why it might be a bad idea that people in the first two categories make strategic decisions, but of course nothing ever stops people desperately believing in vendor fairy tales, including long-distance vMotion, SDN or intent-based networking.