Building network automation solutions

9 module online course

Start now!

Category: data center

Stretched VLANs and Failing Firewall Clusters

After publishing the Disaster Recovery Faking, Take Two blog post (you might want to read that one before proceeding) I was severely reprimanded by several people with ties to virtualization vendors for blaming virtualization consultants when it was obvious the firewall clusters stretched across two data centers caused the total data center meltdown.

Let’s chase that elephant out of the room first. When you drive too fast on an icy road and crash into a tree who do you blame?

  • The person who told you it’s perfectly OK to do so;
  • The tire manufacturer who advertised how safe their tires were?
  • The tires for failing to ignore the laws of physics;
  • Yourself for listening to bad advice

For whatever reason some people love to blame the tires ;)

read more see 10 comments

Saved: TCP Is the Most Expensive Part of Your Data Center

Years ago Dan Hughes wrote a great blog post explaining how expensive TCP is. His web site is long gone, but I managed to grab the blog post before it disappeared and he kindly allowed me to republish it.


If you ask a CIO which part of their infrastructure costs them the most, I’m sure they’ll mention power, cooling, server hardware, support costs, getting the right people and all the usual answers. I’d argue one the the biggest costs is TCP, or more accurately badly implemented TCP.

read more see 5 comments

Disaster Recovery Faking, Take Two

An anonymous (for reasons that will be obvious pretty soon) commenter left a gem on my Disaster Recovery Test Faking blog post that is way too valuable to be left hidden and unannotated.

Here’s what he did:

Once I was tasked to do a DR test before handing over the solution to the customer. To simulate the loss of a data center I suggested to physically shutdown all core switches in the active data center.

read more see 11 comments

Disaster Recovery Test Faking: Another Use Case for Stretched VLANs

The March 2019 Packet Pushers Virtual Design Clinic had to deal with an interesting question:

Our server team is nervous about full-scale DR testing. So they have asked us to stretch L2 between sites. Is this a good idea?

The design clinic participants were a bit more diplomatic (watch the video) than my TL&DR answer which would be: **** NO!

Let’s step back and try to understand what’s really going on:

read more see 5 comments

Switch Buffer Sizes and Fermi Estimates

In my quest to understand how much buffer space we really need in high-speed switches I encountered an interesting phenomenon: we no longer have the gut feeling of what makes sense, sometimes going as far as assuming that 16 MB (or 32MB) of buffer space per 10GE/25GE data center ToR switch is another $vendor shenanigan focused on cutting cost. Time for another set of Fermi estimates.

Let’s take a recent data center switch using Trident II+ chipset and having 16 MB of buffer space (source: awesome packet buffers page by Jim Warner). Most of switches using this chipset have 48 10GE ports and 4-6 uplinks (40GE or 100GE).

read more see 8 comments

Don't Base Your Design on Vendor Marketing

Remember how Arista promoted VXLAN coupled with deep buffer switches as the perfect DCI solution a few years ago? Someone took Arista’s marketing too literally, ran with the idea and combined VXLAN-based DCI with traditional MLAG+STP data center fabric.

While I love that they wrote a blog post documenting their experience (if only more people would do that), it doesn’t change the fact that the design contains the worst of both worlds.

Here are just a few things that went wrong:

read more see 10 comments

Building Fabric Infrastructure for an OpenStack Private Cloud

An attendee in my Building Next-Generation Data Center online course was asked to deploy numerous relatively small OpenStack cloud instances and wanted select the optimum virtual networking technology. Not surprisingly, every $vendor had just the right answer, including Arista:

We’re considering moving from hypervisor-based overlays to ToR-based overlays using Arista’s CVX for approximately 2000 VLANs.

As I explained in Overlay Virtual Networking, Networking in Private and Public Clouds and Designing Private Cloud Infrastructure (plus several presentations) you have three options to implement virtual networking in private clouds:

read more see 1 comments

Real-Life Data Center Meltdown

A good friend of mine who prefers to stay A. Nonymous for obvious reasons sent me his “how I lost my data center to a broadcast storm” story. Enjoy!


Small-ish data center with several hundred racks. Row of racks supported by an end-of-row stack. Each stack with 2 x L2 EtherChannels, one EC to each of 2 core switches. The inter-switch link details don’t matter other than to highlight “sprawling L2 domains."

VLAN pruning was used to limit L2 scope, but a few VLANs went everywhere, including the management VLAN.

read more see 3 comments

How Common Are Data Center Meltdowns?

We all know about catastrophic headline-generating failures like AWS East-1 region falling apart or a major provider being down for a day or two. Then there are failures known only to those who care, like losing a major exchange point. However, I’m becoming more and more certain that the known failures are not even the tip of the iceberg - they seem to be the climber at the iceberg summit.

read more see 10 comments

Decide How Badly You Want to Fail

Every time I’m running a data center-related workshop I inevitably get pulled into stretched VLAN and stretched clusters discussion. While I always tell the attendees what the right way of doing this is, and explain the challenges of stretched VLANs from all perspectives (application, database, storage, routing, and broadcast domains) the sad truth is that sometimes there’s nothing you can do.

You’ll find a generic version of that explanation in Building Active-Active and Disaster Recovery Data Centers webinar. Every few months I might be available for an onsite version of that same discussion, or you could engage one of the other ExpertExpress consultants.

In those sad cases, I can give the workshop attendees only one advice: face the reality, and figure out how badly you might fail. It’s useless pretending that you won’t get into a split-brain scenario - redundant equipment just makes it less likely unless you over-complicated it in which case adding redundancy reduces availability. It’s also useless pretending you won’t be facing a forwarding loop.

read more see 2 comments

Automating Cisco ACI Environment with Python and Ansible

This is a guest blog post by Dave Crown, Lead Data Center Engineer at the State of Delaware. He can be found automating things when he's not in meetings or fighting technical debt.


Over the course of the last year or so, I’ve been working on building a solution to deploy and manage Cisco’s ACI using Ansible and Git, with Python to spackle in cracks. The goal I started with was to take the plain-text description of our network from a Git server, pull in any requirements, and use the solution to configure the fabric, and lastly, update our IPAM, Netbox. All this without using the GUI or CLI to make changes. Most importantly, I want to run it with a simple invocation so that others can run it and it could be moved into Ansible Tower when ready.

read more add comment

Feedback: Data Center Interconnects Webinar

I got great feedback about the first part of Data Center Interconnects webinar from one of ipSpace.net subscribers:

I had no specific expectation when I started watching the material and I must have watched it 6 times by now.

Your webinar covered just the right level of detail to educate myself or refresh my knowledge on the technologies and relevant options for today’s market choices

The information provided is powerful and avoids useless discussions which vendors and PowerPoint pitches. Once you ask the right question it’s easy to get an idea of the vendor readiness

In the first live session we covered the easy cases: design considerations, and layer-3 interconnect with path separation (multiple routing domains). The real fun will start in the second live session on March 19th when we’ll dive into stretched VLANs and long-distance vMotion ideas.

You can attend the live session with any paid ipSpace.net subscriptiondetails here.

add comment

Private VLANs with VXLAN

Got this remark from a reader after he read the VXLAN and Q-in-Q blog post:

Another area where there is a feature gap with EVPN VXLAN is Private VLANs with VXLAN. They’re not supported on either Nexus or Juniper switches.

I have one word on using private VLANs in 2019: Don’t. They are messy and hard to maintain (not to mention it gets really interesting when you’re combining virtual and physical switches).

read more see 6 comments
Sidebar