I planned to take my summer break seriously and stop blogging until late August, but then I shouldn’t have looked at my Twitter feed (my bad), where the AI algorithms selected just the right morsel to trigger the maximum rantiness. I would strongly recommend you read the original tweet and all the responses first – it looks like it was a serious suggestion, not a trolling exercise.
One of ipSpace.net subscribers sent me this interesting question:
I am the network administrator of a small data center network that spans 2 buildings. The main building has a pair of L2/L3 10G core switches. The second building has a stack of access switches connected to the main building with 10G uplinks. This secondary datacenter has got some ESX hosts and NAS for remote backup and some VM for development and testing, but all the Internet connection, firewall and server are in the main building.
There is no routing in the secondary building and most of the VLANs are stretched. Do you think I must change that (bringing routing to the secondary datacenter), or keep it simple like it is now?
As always, it depends, this time on what problem are you trying to solve?
Henk Smit left numerous questions in a comment referring to the Rethinking BGP in the Data Center presentation by Russ White:
In Russ White’s presentation, he listed a few requirements to compare BGP, IS-IS and OSPF. Prefix distribution, filtering, TE, tagging, vendor-support, autoconfig and topology visibility. The one thing I was missing was: scalability.
I noticed the same thing. We kept hearing how BGP scales better than link-state protocols (no doubt about that) and how you couldn’t possibly build a large data center fabric with a link-state protocol… and yet this aspect wasn’t even mentioned.
Namex, an Italian IXP, decided to replace their existing peering fabric with a fully automated leaf-and-spine fabric using VXLAN and EVPN running on Cumulus Linux.
They documented the design, deployment process, and automation scripts they developed in an extensive blog post that’s well worth reading. Enjoy ;)
Here’s one of the major differences between Facebook and Google: one of them publishes research papers with helpful and actionable information, the other uses publications as recruitment drive full of we’re so awesome but you have to trust us – we’re not sharing the crucial details.
Just in case you haven’t realized: Petr Lapukhov of the RFC 7938 fame moved from Microsoft to Facebook a few years ago. Coincidence? I think not.
Ever since draft-lapukhov was first published almost a decade ago, we all knew BGP was the only routing protocol suitable for data center networking… or at least Thought Leaders and vendor marketers seem to be of that persuasion.
Johan Gustawsson wrote a lengthy blog post describing Telia’s approach to next-generation Internet backbone architecture… and it’s so refreshing seeing someone bringing to life what some of us have been preaching for ages:
- Simplify the network;
- Stop cramming ever-more-complex services into the network;
- Bloated major vendor NPUs implementing every magic ever envisioned are overpriced – platforms like Broadcom Jericho2 are good enough for most use cases.
- Return from large chassis-based stupidities to network-centric high availability.
I don’t know enough about optics to have an opinion on what they did there, but it looks as good as the routing part. It would be great to hear your opinion on the topic – write a comment.
In the Site Design part of Cisco SD-WAN webinar, David Penaloza described capabilities you can use when designing complex sites, like extending SD-WAN transport between SD-WAN edge nodes, or implementing high availability between them. He also explained how to track an Internet-facing interface and a service beyond its next hop.
Ethan Banks wrote the best one-line description of the crazy stuff we have to deal with in his When Stretching Layer Two, Separate Your Fate blog post:
No application should be tightly coupled to an IP address. This common issue should really be solved by application architects rebuilding the app properly instead of continuing like it’s 1999 while screaming YOLO.
Not that his (or my) take on indisputable facts would change anything… At least we can still enjoy a good rant ;)
I’ve been saying the same thing for years, but never as succinctly as Alastair Cooke did in his Understand Your Single Points of Failure (SPOF) blog post:
The problem is that each time we eliminated a SPOF, we at least doubled our cost and complexity. The additional cost and complexity are precisely why we may choose to leave a SPOF; eliminating the SPOF may be more expensive than an outage cost due to the SPOF.
Obviously that assumes that you’re able to follow business objectives and not some artificial measure like uptime. Speaking of artificial measures, you might like the discussion about taxonomy of indecision.
Scott Berkun wrote another great article that’s equally applicable to the traditional notion of design (his specialty) and the network design. Read it, replace design with network design, and use its lessons. Here’s just a sample:
- Convincing people is a social process
- Aim for small wins, not conversions of belief systems
- Allies matter more than ideas
- Design maturity grows one step at a time.
Scott submitted an interesting the comment to my Does Unequal-Cost Multipath (UCMP) Make Sense blog post:
How about even Large CLOS networks with the same interface capacity, but accounting for things to fail; fabric cards, links or nodes in disaggregated units. You can either UCMP or drain large parts of your network to get the most out of ECMP.
Before I managed to write a reply (sometimes it takes months while an idea is simmering somewhere in my subconscious) Jeff Tantsura pointed me to an excellent article by Erico Vanini that describes the types of asymmetries you might encounter in a leaf-and-spine fabric: an ideal starting point for this discussion.
As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? You mentioned that it’s difficult to create application swimlanes that way. But does subnet matter if your VMs are in different AZs?
It’s time I explain the concepts of application swimlanes and how they apply to availability zones in public clouds.
One of my readers was “blessed” with the stretched VLANs requirement combined with the need for inter-VLAN routing and sub-par equipment from a vendor not exactly known for their data center switching products. Before going on, you might want to read his description of the challenge he’s facing and what I had to say about the idea of building stackable switches across multiple locations.
Here’s an overview diagram of what my reader was facing. The core switches in each location work as a single device (virtual chassis), and there’s MLAG between core and edge switches. The early 2000s just called and they were proud of the design (but to be honest, sometimes one has to work with the tools his boss bought, so…).
A few weeks ago Adrian Giacometti described a no-stretched-VLANs disaster recovery design he used for one of his customers.
The blog post and related LinkedIn posts generated tons of comments (and objections from the usual suspects), prompting Adrian to write a sequel describing the design requirements he was facing, tradeoffs he made, and interactions between server and networking team needed to make it happen.