Etienne-Victor Depasquale, a researcher at University of Malta, is trying to figure out what technologies service providers use to build real-life metro-area networks, and what services they offer on top of that infrastructure.
If you happen to be involved with a metro area network, he’d love to hear from you – please fill in this survey – and he promised that he’ll share the results of the survey with the participants.
It’s so refreshing to find someone who understands the impact of latency on application performance, and develops a methodology that considers latency when migrating a workload into a public cloud: Adding latency: one step, two step, oops by Lawrence Jones.
Ages ago when we were building networks using super-expensive 64kbps WAN links, a customer sent us a weird bug report:
Everything works fine, but we cannot transfer one particular file between two locations – the file transfer stalls and eventually times out. At the same time, we’re seeing increased number of CRC errors on the WAN link.
My chat with the engineer handling the ticket went along these lines:
Long long time ago, we built a multi-protocol WAN network for a large organization. Everything worked great, until we got the weirdest bug report I’ve seen thus far:
When trying to transfer a particular file with DECnet to the central location, the WAN link drops. That does not happen with any other file, or when transferring the same file with TCP/IP. The only way to recover is to power cycle the modem.
Try to figure out what was going on before reading any further ;)
Almost exactly a decade ago I wrote that VXLAN isn’t a data center interconnect technology. That’s still true, but you can make it a bit better with EVPN – at the very minimum you’ll get an ARP proxy and anycast gateway. Even this combo does not address the other requirements I listed a decade ago, but maybe I’m too demanding and good enough works well enough.
However, there is one other bit that was missing from most VXLAN implementations: LAN-to-WAN VXLAN-to-VXLAN bridging. Sounds weird? Supposedly a picture is worth a thousand words, so here we go.
Enrique Vallejo asked an interesting question a while ago:
When was X.25 official declared dead? Note that the wikipedia claims that it is still in use in parts of the world.
Wikipedia is probably right, and had several encounters with X.25 that would corroborate that claim. If you happen to have more up-to-date information, please leave a comment.
A while ago someone pointed me to an interesting talk explaining why 99th percentile represents a pretty good approximation of user-experienced latency on a typical web page (way longer version: Understanding Latency and Application Responsiveness, also How I Learned to Stop Worrying and Love Misery)
If you prefer reading instead of watching videos, there’s also everything you know about latency is wrong.
One of the attendees of our Building Network Automation Solutions online course asked an interesting question in the course Slack team:
Has anyone wrote a playbook for putting a circuit into maintenance mode — i.e. adjusting metrics to drain traffic away from a circuit that is going to be taken down for maintenance?
As always, you have to figure out what you want to do before you can start automating stuff.
After decades of riding the Moore’s law curve the networking bandwidth should be (almost) infinite and (almost) free, right? WRONG, as I explained in the Bandwidth Is (Not) Infinite and Free video (part of How Networks Really Work webinar).
There are still pockets of Internet desert where mobile- or residential users have to deal with traffic caps, and if you decide to move your applications into any public cloud you better check how much bandwidth those applications consume or you’ll be the next victim of the Great Bandwidth Swindle. For more details, watch the video.
After the “shocking” revelation that a network can never be totally reliable, I addressed another widespread lack of common sense: due to laws of physics, the client-server latency is never zero (and never even close to what a developer gets from the laptop’s loopback interface).
Unless you’re working for a cloud-only startup, you’ll always have to connect applications running in a public cloud with existing systems or databases running in a more traditional environment, or connect your users to public cloud workloads.
Public cloud providers love stable and robust solutions, and they took the same approach when implementing their legacy connectivity solutions: you could use routed Ethernet connections or IPsec VPN, and run BGP across them, turning the problem into a well-understood routing problem.
Listening to public cloud evangelists and marketing departments of vendors selling over-the-cloud networking solutions or multi-cloud orchestration systems, you could start to believe that migrating your workload to a public cloud would solve all your problems… and if you’re gullible enough to listen to them, you’ll get the results you deserve.
Unfortunately, nothing can change the fundamental laws of physics, networking, or application architectures:
A long while ago I got into an hilarious Tweetfest (note to self: don’t… not that I would ever listen) starting with:
Which feature and which Cisco router for layer2 extension over internet 100Mbps with 1500 Bytes MTU
The knee-jerk reaction was obvious: OMG, not again. The ugly ghost of BRouters (or is it RBridges or WAN Extenders?) has awoken. The best reply in this category was definitely:
I cannot fathom the conversation where this was a legitimate design option. May the odds forever be in your favor.
A dozen “this is a dumpster fire” tweets later the problem was rephrased as:
This is a common objection I get when trying to persuade network architects they don’t need stretched VLANs (and IP subnets) to implement data center disaster recovery:
Changing IP addresses when activating DR is hard. You’d have to weigh the manageability of stretching L2 and protecting it, with the added complexity of breaking the two sites into separate domains [and subnets]. We all have apps with hardcoded IP’s, outdated IPAM’s, Firewall rules that need updating, etc.
Let’s get one thing straight: when you’re doing disaster recovery there are no live subnets, IP addresses or anything else along those lines. The disaster has struck, and your data center infrastructure is gone.
One of the responses to my Disaster Recovery Faking blog post focused on failure domains:
What is the difference between supporting L2 stretched between two pods in your DC (which everyone does for seamless vMotion), and having a 30ms link between these two pods because they happen to be in different buildings?
I hope you agree that a single broadcast domain is a single failure domain. If not, let agree to disagree and move on - my life is too short to argue about obvious stuff.