I published a blog post describing how complex the underlay supporting VMware NSX still has to be (because someone keeps pretending a network is just a thick yellow cable), and the tweet announcing it admittedly looked like a clickbait.
[Blog] Do We Need Complex Data Center Switches for VMware NSX Underlay
Martin Casado quickly replied NO (probably before reading the whole article), starting a whole barrage of overlay-focused neteng-versus-devs fun.
The best response to Martin’s claim was made by Mat Jovanovic:
Depends… Are we looking at a PPT, or a “I’ve tried it on a commodity underlay” version of the answer? Something tells me it’s quite different…
In the meantime, the debate veered into “my overlay is better than your overlay”, starting with Martin's claim that:
Good news for you – there are many fast growing overlay solutions that are adopted by apps and security teams and bypass the networking teams altogether.
However, being sick-and-tired of everyone claiming how great it is to build overlays on top of overlays (like we didn’t learn anything in the decades building GRE and IPsec tunnels), I decided to troll a bit more:
We had a fast-growing overlay solution in 1970s. It was called TCP. I’ve heard it might still be used. Why do people insist of heaping layers upon layers instead of writing decent code?
Martin's response was almost as-expected:
App developers : “I’ve created this amazing overlay solution that solves a bunch of our problems”
Networking : “TCP has been around since the 70’s, write better code”
… this is why you’re not being invited to the party ;)
Someone must have had some traumatic experiences... Anyhow, as you probably know I’m well-aware of the popularity of pointing out the state of Emperor’s wardrobe (or lack thereof), and I’m way too old for FOMO, so I don’t care what parties I get invited to.
However, what makes me truly sad is watching highly intelligent people ignorant of environmental limitations (see also: fallacies of distributed computing and RFC 1925 rule 4) reinventing the wheels, and ending with what we already had (in a different disguise, see also RFC 1925 rule 11) after spending years figuring it out and repeating the mistakes we made in the past.
- We won’t use DNS (for whatever made-up reason), because we believe in IP addresses. Years later: We don’t care about stinking IP addresses anymore, we have Consul (hint: ever heard of SRV records?)
- We tied everything to IP addresses, so you better move them across the globe and into public clouds, and you can’t change them when doing disaster recovery. Years later: containers are cool, and we use Consul anyway, so it’s perfectly fine to hide a dozen of thingies behind the same IP address.
- We’ll implement our own overlay because your overlay sucks. Years later: OMG, VXLAN has no security, as security researches tend to find out every other year or so.
- Centralized control plane is the way to go. Years later: Ouch, scalability and latency suck. Maybe we should focus on automation... or intent... or policies... or whatever.
The networking engineers should know better, but even they can’t resist the lure of reinventing broken wheels, for example overlays with cache-based forwarding like LISP. No surprise, such solutions quickly encounter endpoint liveliness problem (and a few others).
- I’m guessing LISP is not yet widespread enough to encounter severe cache trashing behavior that still triggers PTSD in anyone remotely involved in the days when Fast Switching crashed the Internet. That rerun might be fun to watch…
- Of course I probably messed up at least some of these examples, so please feel to correct me in the comments.
Now here’s a crazy idea: what if we’d start communicating with people who understand how stuff works, learn from them, and implement stuff in an optimal way. IT seems to be one of the few areas where we allow people to build sandcastles and ignore the tides, and then blame someone else when the water inevitable arrives.