This topic has been on my to-write list for over a year and its working title was phrased as a question, but all the horror stories you’ve shared with me over the last year or so (some of them published in my blog) have persuaded me that there’s no question – it’s a fact.
If you think I’m rephrasing the same topic ad nauseam, you’re right, but every month or so I get an external trigger that pushes me back to the same discussion, this time an interesting comment thread on Massimo Re Ferre’s blog.
There are numerous reasons why we’re experiencing problems with transparently bridged Ethernet networks, ranging from challenges inherent in the design of Spanning Tree Protocol (STP) and suboptimal implementations of STP, to flooding behavior inherent in transparent bridging.
You can solve some of these issues with novel technologies like SPB (802.1aq) or TRILL, but you can’t change the basic fact: once you get a loop in a bridged network, and a broadcast packet caught in that loop (and flooded all over the network every time it’s forwarded by a switch), you’re toast.
SPB aficionados will tell me loops cannot happen in SPB networks because of RPF checks. Just wait till you hit the first interesting software bug or an IS-IS race condition.
Yes, there’s storm control, and you can deploy it on every link in your network, but the single circulating broadcast packet (and its infinite copies) will trigger storm control on all switches, and prevent other valid broadcasts (for example, ARP requests) from being propagated, effectively causing a DoS attack on the whole layer-2 domain. Furthermore, the never-ending copies of the same broadcast packets delivered to the CPU of every single switch in the layer-2 domain will eventually start interfering with the control-plane protocols, causing further problems.
The obvious conclusion: transparently bridged network (aka layer-2 network or VLAN) is a single failure domain.
Why am I telling you this (again and again)?
Some people think that you experience bridging-related problems only if you’re big enough, but everything is going to be fine if you have less than a thousand VMs, less than a hundred servers, less than ten switches … or whatever other number you come up with to pretend you’re safe. That’s simply not true – I’ve seen a total network meltdown in a (pretty small) data center with three (3) switches.
The only difference between a small(er) and big(ger) data center is that you might not care if your small data center goes offline for an hour or so, but if you do, then you simple have to split it up into multiple layer-2 domains connected through layer-3 switches (or load balancers or firewalls if you so desire).
If you’re serious about the claims that you have mission-critical applications that require high availability (and everyone claims they have them), then you simply have to create multiple availability zones in your network, and spread multiple copies of the same application across them. As Amazon proved, even multiple availability zones might not be good enough, but having them is infinitely better than having a single failure domain.
The usual counterarguments
This is what I usually hear after presenting the above sad facts to data center engineers: “there’s nothing we can do”, “but our users require unlimited VM mobility”, “our applications won’t work otherwise” and a few similar ones. These are all valid claims, but as always in life, you have to face the harsh reality: either you do it right (and everyone accepts the limitations of doing it right), or you’ll pay for it in the future.
As always (in the IT world), there’s always the third way: use MAC-over-IP network virtualization (in form of VXLAN, NVGRE or STT). Once these technologies get widely adopted and implemented in firewalls and load balancers (or we decide to migrate from physical to virtual appliances), they’ll be an excellent option. In the meantime, you have to choose the lesser evil (whatever you decide it is).
You probably know you’ll find a lot more information in my data center and virtualization webinars, but there’s also a book I would highly recommend to anyone considering more than just how to wire a bunch of switches together – Scalability Rules is an awesome collection of common-sense and field-tested scalability rules (including a number of networking-related advices not very dissimilar from what I’m always telling you). Finally, if you’d like to have my opinion on your data center design, check out the ExpertExpress service.