Stop reinventing the wheel and look around

Building large-scale VLANs to support IaaS services is every data center designer’s nightmare and the low number of VLANs supported by some data center gear is not helping anyone. However, as Anonymous Coward pointed out in a comment to my Building a Greenfield Data Center post, service providers have been building very large (and somewhat stable) layer-2 transport networks for years. It does seem like someone is trying to reinvent the wheel (and/or sell us more gear).

A few disclaimers and caveats first:

The service providers don’t care about the end-to-end stability of your network. They provide you with a (hopefully stable) L2 transport you’ve asked for and limit your flooding bandwidth (be it broadcasts, multicasts or unknown unicasts). If you’re smart and connect routers to the L2 transport network, you’ll have a working solution (or not – just ask Greg Ferro about VPLS services). If you bridge your sites across a L2 transport network, you’ll eventually get a total network meltdown. In the data center, we don’t have the luxury of ignoring how well the servers or applications work.

Stable large L2 networks are hard to engineer. I’ve been talking with a great engineer who actually designed and built a large L2 Service Provider network. It took them quite a while to get all the details right (remember: STP was the only game in town) and make the network rock solid.

Connectivity is Service Providers’ core business and gets the appropriate focus and design/implementation budget. Networking in a data center is usually considered to be a minor (and always way too expensive) component.

However, regardless of the differences between service provider transport networks and data center networks, what we’re trying to do in every data center that offers IaaS services relying on dumb layer-2 hypervisor switches has been done numerous times in another industry. I know that learning from others never equals the thrills of figuring it all out on your own and that networking vendors need a continuous stream of reasons to sell you more (and different) boxes ... but maybe we should stop every now and then, look around, figure out whether someone else has already solved the problem that’s bothering us, and benefit from their experience.

Kurt Bales did exactly that a few days ago – trying to solve multi-tenancy issues that exceeded VLAN limitations of Nexus 5000, he decided to use service provider gear in his data center network. I know he was cheating – he has Service Provider background – but you should read his excellent post (several times) and if you agree with his approach, start looking around – explore what the service providers are doing, what the SP networking gear is capable of doing, and start talking to the vendors that were traditionally strong in L2 service provider market ... or you might decide to wait a few months for L3-encapsulating hypervisor switches (and as soon as Martin Casado is willing to admit what they’re doing I’ll be more than happy to blog about it).

More information

You’ll find in-depth discussions of data center architectures and virtual networking in my Data Center 3.0 for Networking Engineers and VMware Networking Deep Dive.


  1. Just curious, when one references "Large L2 network" does it mean broadcast domain with large number of switches (>50) and not so large number of nodes on that domain (< 200), or does it simply mean large number of nodes (dense domains)and large number of switches ? I don't believe SPs were successful addressing the latter scenario w/o link state protocols. Surely it's possible to engineer stable network with large number of switches with multiple broadcast domains that are sparsely populated using STP but I'm not so sure about large number of switches with really large & dense population of nodes per broadcast domain.

    And to Ivan's point if you have to use link-state on L2 to scale broadcast domains lots of switches and pack them with lot more systems per domain, it does look like re-inventing wheel, though of course as AC pointed out, developers are not going to change their habits so might as well re-invent it.
  2. The "layer" based differentiation between the stereotypic concepts of "bridging" and "routing" is, in fact, misleading. Major advantage of so-called "L2 Ethernet" network that makes it so attractive is flat name space. It's just historically packet routing in "flat" Ethernet networks were based on simple flooding models, emulating the shared cable. This routing model could be properly redesigned without modifying the "access" method itself, eliminating any scaling differences between the so-called "L2" and "L3". In fact, if you think of it, the OSI/ARPA "layering" models were probably biggest obstacles on the way to understanding the protocol design, due to their rigid structure. Another huge problem, of course, is tremendous networking industry inertia, that hinders the innovation progress.
  3. How did you guess the topic of one of upcoming blog posts? Scary ... ;)
  4. Will try to get more details, but there are few generic tricks you can use:

    * Don't mesh the network too much (dual trees work best)
    * Use 802.1ah (MAC-in-MAC) not 802.1ad (Q-in-Q). With MAC-in-MAC the core switches don't need to know the customer's MAC addresses (and you can fine-tune the broadcast domains)

    Not sure what L2 link-state protocols you have in mind. The first SPB (802.1aq) products have just started to appear.
  5. Ulan,

    I had privilege (luck?) to participate in building a fairly large L2 network (100+ nodes across fairly large geography - probably 150km+ between furthest nodes) in early 2000s. It was built using traditional enterprise switches, and served connectivity between two main hubs and the rest of the nodes with bandwidths around 10-100 Mbit/s per minor node. There was no communication between minor nodes at L2. Each minor node sat on one or more VLANs which terminated at both main nodes. At all sites hand-off to the customer was L2, and it was up to them how to connect it (router or switch). Memory is starting to fail me as I wasn't involved with operating that network much, but from what I remember there were definitely more than a couple of MACs visible per node in CAM tables, but not hundreds.

    So to answer your question: in my case it was a large number of switches, a decent number of L2 domains, with not too many MACs in each domain.

    Network was controlled by MSTP.
  6. Ivan: Could not agree more with your sentiments in general. Service-Provider gear has ample network virtualization features! Read a book, deploy MPLS or Q-in-Q! Its not that hard!
  7. Or mac-in-mac! :)
Add comment