Bridging « ipSpace.net blog

Tuesday, October 25, 2011 07:30 +0200

QFabric Part 4 – Spanning Tree Protocol

2021-01-03: Even though QFabric was an interesting architecture (and reverse-engineering it was a fun intellectual exercise), it withered a few years ago. Looks like Juniper tried to bite off too much.

Initial release of QFabric Junos can run STP only within the network node (see QFabric Control Plane post for more details), triggering an obvious question: “what happens if a server multihomed to a server node starts bridging between its ports and starts sending BPDUs?”. Some fabric solutions try to ignore STP (the diplomats would say “they are transparent to STP”) but fortunately Juniper decided to do the right thing.

Large-Scale Bridging = Nuked Earth

If you’re not working for a data center fabric vendor, you’ll probably enjoy the excellent analogy Ethan Banks made after reading my TRILL-over-WAN post:

Think of a network topology like a road map. There's boulevards, major junction points, highways, dead ends, etc. Now imagine what that map looks like after it's been nuked from orbit: flat. Sure, we blew up the world, but you can go in a straight line anywhere you want.

... and don’t forget to be nice to the people asking for inter-DC VM mobility ;)

add comment

Thursday, September 1, 2011 06:06 +0200

VXLAN, OTV and LISP

Immediately after VXLAN was announced @ VMworld, the twittersphere erupted in speculations and questions, many of them focusing on how VXLAN relates to OTV and LISP, and why we might need a new encapsulation method.

VXLAN, OTV and LISP are point solutions targeting different markets. VXLAN is an IaaS infrastructure solution, OTV is an enterprise L2 DCI solution and LISP is ... whatever you want it to be.

Imagine the Ruckus When the Hypervisor Vendors Wake Up

It seems that most networking vendors consider the Flat Earth architectures the new bonanza. Everyone is running to join the gold rush, from Cisco’s FabricPath and Brocade’s VCS to HP’s IRF and Juniper’s upcoming QFabric. As always, the standardization bodies are following the industry with a large buffet of standards to choose from: TRILL, 802.1ag (SPB), 802.1Qbg (EVB) and 802.1bh (Port extenders).

EVB (802.1Qbg) – the S component

Update 2021-01-03: IBM implemented EVB in Linux bridge, and Juniper added EVB support to Junos, but I haven't seen (or heard of) a single EVB implementation since I wrote this blog post almost 9 years ago.

The Edge Virtual Bridging (EVB; 802.1Qbg) standard solves two important layer-2-based virtualization issues:

Automatic provisioning of access switches based on hypervisor-signaled information (discussed in the EVB eases VLAN configuration pains article)
Multiplexing of multiple logical 802.1Q links over a single physical link.

Logical link multiplexing might seem a solution in search of a problem until you discover that VMware-related design documents usually recommend using 6 to 10 NICs per server – an approach that either wastes switch ports or is hard to implement with blade servers’ mezzanine cards (due to limited number of backplane connections).

TRILL/Fabric Path – STP Integration

Every Data Center fabric technology has to integrate seamlessly with legacy equipment running the venerable Spanning Tree Protocol (STP) or one of its facelifted incarnations (for example, RSTP or MST). The alternative, called rip-and-replace when talking about other vendors’ boxes or synchronized upgrade when promoting your wares (no, I haven’t heard it yet, but I’m sure it’s coming), is simply indigestible to most data center architects.

TRILL and Cisco’s proprietary Fabric Path take a very definitive stance: the new fabric is the backbone of the network routing TRILL-encapsulated layer-2 frames across bridged segments (TRILL) or contiguous backbone (Fabric Path). Both architectures segment the original STP domain into small chunks at the edges of the network as shown in the following figure:

Don’t Try to Fake Multi-chassis Link Aggregation (MLAG)

Martin sent me an interesting challenge: he needs to connect an HP switch in a blade enclosure to a pair of Catalyst 3500G switches. His Catalysts are not stackable and he needs the full bandwidth between the switches, so he decided to fake the multi-chassis link aggregation functionality by configuring static LAG on the HP switch and disabling STP on it (the Catalysts have no idea they’re talking to the same switch):

Does Bridge Assurance Make UDLD Obsolete?

I got an interesting question from Andrew:

Would you say that bridge assurance makes UDLD unnecessary? It doesn't seem clear from any resource I've found so far (either on Cisco's docs or on Google)."

It’s important to remember that UDLD works on physical links whereas bridge assurance works on top of STP (which also implies it works above link aggregation/port channel mechanisms). UDLD can detect individual link failure (even when the link is part a LAG); bridge assurance can detect unaggregated link failures, total LAG failure, misconfigured remote port or a malfunctioning switch.

Traffic Trombone (what it is and how you get them)

Every so often I get a question “what exactly is a traffic trombone/tromboning”. Here’s my attempt at a semi-formal definition.

Traffic trombone is a term (probably invented by Greg Ferro) that colorfully describes inter-VLAN traffic flows in a network with stretched (usually overlapping) L2 domains.

In a traditional L2/L3 data center architecture with small L2 domains in the access layer and L3 forwarding across the core network, the inter-subnet traffic flows were close to optimal: a host would send a packet toward the first-hop (ingress) router (across a bridged L2 subnet), the ingress router would forward the packet across an optimal path toward the egress router, and the egress router would deliver the packet (yet again, across a bridged L2 subnet) to the destination host.

Local Area Mobility (LAM) – the true story

Every time I mention that Cisco IOS had Local Area Mobility (LAM) (the feature that would come quite handy in today’s virtualized data centers) more than a decade ago, someone inevitably asks “why don’t we use it?” LAM looks like a forgotten step-child, abandoned almost as soon as it was created (supposedly it never got VRF support). The reason is simple (and has nothing to do with the size of L3 forwarding tables): LAM was always meant to be a short-term kludge and L3 gurus never appreciated its potentials.

Category: Bridging