Category: design
Disaster Recovery Test Faking: Another Use Case for Stretched VLANs
The March 2019 Packet Pushers Virtual Design Clinic had to deal with an interesting question:
Our server team is nervous about full-scale DR testing. So they have asked us to stretch L2 between sites. Is this a good idea?
The design clinic participants were a bit more diplomatic (watch the video) than my TL&DR answer which would be: **** NO!
Let’s step back and try to understand what’s really going on:
Know Thy Environment Before Redesigning It
A while ago I had an interesting consulting engagement: a multinational organization wanted to migrate off global Carrier Ethernet VPN (with routers at the edges) to MPLS/VPN.
While that sounds like the right thing to do (after all, L3 must be better than L2, right?) in that particular case they wanted to combine the provider VPN with Internet-based IPsec VPN… and doing that in parallel with MPLS/VPN tends to become an interesting exercise in “how convoluted can I make my design before I give up and migrate to BGP”.
Don't Base Your Design on Vendor Marketing
Remember how Arista promoted VXLAN coupled with deep buffer switches as the perfect DCI solution a few years ago? Someone took Arista’s marketing too literally, ran with the idea and combined VXLAN-based DCI with traditional MLAG+STP data center fabric.
While I love that they wrote a blog post documenting their experience (if only more people would do that), it doesn’t change the fact that the design contains the worst of both worlds.
Here are just a few things that went wrong:
Bathwater and Hyperscalers
Russ White recently wrote an interesting blog post claiming how we should not ignore any particular technology just because it was invented by a hyperscaler illustrating his point with a half-dozen technologies that were first used by NASA.
However, there are “a few” details he glossed over:
Real-Life Data Center Meltdown
A good friend of mine who prefers to stay A. Nonymous for obvious reasons sent me his “how I lost my data center to a broadcast storm” story. Enjoy!
Small-ish data center with several hundred racks. Row of racks supported by an end-of-row stack. Each stack with 2 x L2 EtherChannels, one EC to each of 2 core switches. The inter-switch link details don’t matter other than to highlight “sprawling L2 domains."
VLAN pruning was used to limit L2 scope, but a few VLANs went everywhere, including the management VLAN.
Decide How Badly You Want to Fail
Every time I’m running a data center-related workshop I inevitably get pulled into stretched VLAN and stretched clusters discussion. While I always tell the attendees what the right way of doing this is, and explain the challenges of stretched VLANs from all perspectives (application, database, storage, routing, and broadcast domains) the sad truth is that sometimes there’s nothing you can do.
In those sad cases, I can give the workshop attendees only one advice: face the reality, and figure out how badly you might fail. It’s useless pretending that you won’t get into a split-brain scenario - redundant equipment just makes it less likely unless you over-complicated it in which case adding redundancy reduces availability. It’s also useless pretending you won’t be facing a forwarding loop.
Shifting Responsibility in Network Design and Operations
When I started working with Cisco routers in late 1980s all you could get were devices with a dozen or so ports, and CPU-based forwarding (marketers would call it software defined these days). Not surprisingly, many presentations in Cisco conferences (before they were called Networkers or Cisco Live) focused on good network design and split of functionality in core, aggregation (or distribution) and access layer.
What you got following those rules were stable and predictable networks. Not everyone would listen; some customers tried to be cheap and implement too many things on the same box… with predictable results (today they would be quick to blame vendor’s poor software quality).
Feedback: Data Center Interconnects Webinar
I got great feedback about the first part of Data Center Interconnects webinar from one of ipSpace.net subscribers:
I had no specific expectation when I started watching the material and I must have watched it 6 times by now.
Your webinar covered just the right level of detail to educate myself or refresh my knowledge on the technologies and relevant options for today’s market choices
The information provided is powerful and avoids useless discussions which vendors and PowerPoint pitches. Once you ask the right question it’s easy to get an idea of the vendor readiness
In the first live session we covered the easy cases: design considerations, and layer-3 interconnect with path separation (multiple routing domains). The real fun will start in the second live session on March 19th when we’ll dive into stretched VLANs and long-distance vMotion ideas.
You can attend the live session with any paid ipSpace.net subscription – details here.
More Thoughts on Vendor Lock-In and Subscriptions
Albert Siersema sent me his thoughts on lock-in and the recent tendency to sell network device (or software) subscriptions instead of boxes. A few of my comments are inline.
Another trend in the industry is to convert support contracts into subscriptions. That is, the entrenched players seem to be focusing more on that business model (too). In the end, I feel the customer won't reap that many benefits, and you probably will end up paying more. But that's my old grumpy cynicism talking :)
While I agree with that, buying a subscription instead of owning a box (and deprecating it) also makes it easier to persuade the bean counters to switch the gear because there’s little residual value in existing boxes (and it’s easy to demonstrate total-cost-of-ownership). Like every decent sword this one has two blades ;)
Q-in-Q Support in Multi-Site EVPN
One of my subscribers sent me a question along these lines (heavily abridged):
My customer runs a colocation business and has to provide L2 connectivity between racks, sometimes even across multiple data centers. They were using Q-in-Q to deliver that in a traditional fabric and would like to replace that with multi-site EVPN fabric with ~100 ToR switches in each data center. However, Cisco doesn’t support Q-in-Q with multi-site EVPN. Any ideas?
As Lukas Krattiger explained in his part of Multi-Site Leaf-and-Spine Fabrics section of Leaf-and-Spine Fabric Architectures webinar, multi-site EVPN (VXLAN-to-VXLAN bridging) is hard. Don’t expect miracles like Q-in-Q over VNI any time soon ;)
BGP as High Availability Protocol
Every now and then someone tells me I should write more about the basic networking concepts like I did years ago when I started blogging. I’m probably too old (and too grumpy) for that, but fortunately I’m no longer on my own.
Over the years ipSpace.net slowly grew into a small community of networking experts, and we got to a point where you’ll see regular blog posts from other community members, starting with Using BGP as High-Availability protocol written by Nicola Modena, member of ExpertExpress team.
Zen of Routing Protocols
Inspired by The Zen of Python, Dinesh Dutt wrote The Zen of Routing Protocols:
Beautiful is better than ugly.
Simple is better than complex.
Complex is better than complicated.
So just because you can, don't.
Odd Number of Spines in Leaf-and-Spine Fabrics
In the market overview section of the introductory part of data center fabric architectures webinar I made a recommendation to use larger number of fixed-configuration spine switches instead of two chassis-based spines when building a medium-sized leaf-and-spine fabric, and explained the reasoning behind it (increased availability, reduced impact of spine failure).
One of the attendees wondered about the “right” number of spine switches – does it has to be four, or could you have three or five spines. In his words:
Video: SD-WAN Reference Design
After explaining the basics of SD-WAN, Pradosh Mohapatra, the author of SD-WAN Overview webinar focused on SDWAN reference network design.
Architecture before Products
Yves Haemmerli, Consulting IT Architect at IBM Switzerland, sent me a thoughtful response to my we need product documentation rant. Hope you’ll enjoy it as much as I did.
Yes, whatever the project is, the real added value of an IT/network architect consultant is definitely his/her ability to create models (sometimes meta-models) of what exists, and what the customer is really looking for.