Category: Design
Know Thy Environment Before Redesigning It
A while ago, I had an interesting consulting engagement: a multinational organization wanted to migrate off global Carrier Ethernet VPN (with routers at the edges) to MPLS/VPN.
While that sounds like the right thing to do (after all, L3 must be better than L2, right?), in that particular case, they wanted to combine the provider VPN with an Internet-based IPsec VPN. Doing that in parallel with MPLS/VPN tends to become an interesting exercise in “how convoluted can I make my design before I give up and migrate to BGP.”
Don't Base Your Design on Vendor Marketing
Remember how Arista promoted VXLAN coupled with deep buffer switches as the perfect DCI solution a few years ago? Someone took Arista’s marketing too literally, ran with the idea and combined VXLAN-based DCI with traditional MLAG+STP data center fabric.
While I love that they wrote a blog post documenting their experience (if only more people would do that), it doesn’t change the fact that the design contains the worst of both worlds.
Here are just a few things that went wrong:
Bathwater and Hyperscalers
Russ White recently wrote an interesting blog post claiming how we should not ignore any particular technology just because it was invented by a hyperscaler illustrating his point with a half-dozen technologies that were first used by NASA.
However, there are “a few” details he glossed over:
Real-Life Data Center Meltdown
A good friend of mine who prefers to stay A. Nonymous for obvious reasons sent me his “how I lost my data center to a broadcast storm” story. Enjoy!
Small-ish data center with several hundred racks. Row of racks supported by an end-of-row stack. Each stack with 2 x L2 EtherChannels, one EC to each of 2 core switches. The inter-switch link details don’t matter other than to highlight “sprawling L2 domains."
VLAN pruning was used to limit L2 scope, but a few VLANs went everywhere, including the management VLAN.
Decide How Badly You Want to Fail
Every time I’m running a data center-related workshop I inevitably get pulled into stretched VLAN and stretched clusters discussion. While I always tell the attendees what the right way of doing this is, and explain the challenges of stretched VLANs from all perspectives (application, database, storage, routing, and broadcast domains) the sad truth is that sometimes there’s nothing you can do.
In those sad cases, I can give the workshop attendees only one advice: face the reality, and figure out how badly you might fail. It’s useless pretending that you won’t get into a split-brain scenario - redundant equipment just makes it less likely unless you over-complicated it in which case adding redundancy reduces availability. It’s also useless pretending you won’t be facing a forwarding loop.
Shifting Responsibility in Network Design and Operations
When I started working with Cisco routers in late 1980s all you could get were devices with a dozen or so ports, and CPU-based forwarding (marketers would call it software defined these days). Not surprisingly, many presentations in Cisco conferences (before they were called Networkers or Cisco Live) focused on good network design and split of functionality in core, aggregation (or distribution) and access layer.
What you got following those rules were stable and predictable networks. Not everyone would listen; some customers tried to be cheap and implement too many things on the same box… with predictable results (today they would be quick to blame vendor’s poor software quality).
Feedback: Data Center Interconnects Webinar
I got great feedback about the first part of Data Center Interconnects webinar from one of ipSpace.net subscribers:
I had no specific expectation when I started watching the material and I must have watched it 6 times by now.
Your webinar covered just the right level of detail to educate myself or refresh my knowledge on the technologies and relevant options for today’s market choices
The information provided is powerful and avoids useless discussions which vendors and PowerPoint pitches. Once you ask the right question it’s easy to get an idea of the vendor readiness
More Thoughts on Vendor Lock-In and Subscriptions
Albert Siersema sent me his thoughts on lock-in and the recent tendency to sell network device (or software) subscriptions instead of boxes. A few of my comments are inline.
Another trend in the industry is to convert support contracts into subscriptions. That is, the entrenched players seem to be focusing more on that business model (too). In the end, I feel the customer won't reap that many benefits, and you probably will end up paying more. But that's my old grumpy cynicism talking :)
While I agree with that, buying a subscription instead of owning a box (and deprecating it) also makes it easier to persuade the bean counters to switch the gear because there’s little residual value in existing boxes (and it’s easy to demonstrate total-cost-of-ownership). Like every decent sword this one has two blades ;)
Q-in-Q Support in Multi-Site EVPN
One of my subscribers sent me a question along these lines (heavily abridged):
My customer runs a colocation business and has to provide L2 connectivity between racks, sometimes even across multiple data centers. They were using Q-in-Q to deliver that in a traditional fabric and would like to replace that with multi-site EVPN fabric with ~100 ToR switches in each data center. However, Cisco doesn’t support Q-in-Q with multi-site EVPN. Any ideas?
As Lukas Krattiger explained in his part of Multi-Site Leaf-and-Spine Fabrics section of Leaf-and-Spine Fabric Architectures webinar, multi-site EVPN (VXLAN-to-VXLAN bridging) is hard. Don’t expect miracles like Q-in-Q over VNI any time soon ;)
BGP as High Availability Protocol
Every now and then someone tells me I should write more about the basic networking concepts like I did years ago when I started blogging. I’m probably too old (and too grumpy) for that, but fortunately I’m no longer on my own.
Over the years ipSpace.net slowly grew into a small community of networking experts, and we got to a point where you’ll see regular blog posts from other community members, starting with Using BGP as High-Availability protocol written by Nicola Modena, member of ExpertExpress team.