Michael sent me an interesting question:
I work in a rather large enterprise facing a campus network redesign. I am in favor of using a routed access for floor LANs, and make Ethernet segments rather small (L3 switching on access devices). My colleagues seem to like L2 switching to VSS (distribution layer for the floor LANs). OSPF is in use currently in the backbone as the sole routing protocol. So basically I need some additional pros and cons for VSS vs Routed Access. :-)
The follow-up questions confirmed he has L3-capable switches in the access layer connected with redundant links to a pair of Cat6500s:
Like every other blogger, I get occasional e-mails from people fishing for free consulting or second opinion (note: asking a serious technical question is a totally different story; as many people know, I always try to reply and help) and as I’m totally overloaded with OpenFlow symposium and Net Field Day these days, I decided to share one of the better ones.
Chris Marget sent me the following interesting observation:
One of the things we learned back at the beginning of Ethernet is no longer true: hardware filtering of incoming Ethernet frames by the NICs in Ethernet hosts is gone. VMware runs its NICs in promiscuous mode. The fact that this Networking 101 level detail is no longer true kind of blows my mind.
So what exactly is going on and does it matter?
You probably know the old saying – if the mountain doesn’t want to come to you, you have to go out there and climb it. vCider, a brand-new startup launching their product at Gigaom Structure Launchpad, decided to do something similar in the server virtualization (Infrastructure-as-a-Service; IaaS) space – its software allows IaaS customers to build their own virtual layer-2 networks (let’s call then vSubnets) on top of IaaS provider’s IP infrastructure; you can even build a vSubnets between VMs running within your enterprise network (private cloud in the cloudy lingo) and those running within Amazon EC2 or Rackspace.
Full disclosure: Chris Marino from vCider got in touch with me in early June. I found the idea interesting, he helped me understand their product (even offered a test run, but I chose to trust the technical information available on their web site and passed to me in e-mails and phone calls), and I decided to write about it. That’s it.
FullMesh added an excellent comment to my Multi-Chassis Link Aggregation (MLAG) and hot potato switching post. He wrote:
If there are two core routing switches and two access switches which are MLAGged together in both directions, and hosts that are dual-active LAGged to the pair of access switches, then the traffic would stay on whichever side the host places it.
He also opened another can of worms: load balancing in MLAG environment is dictated by the end hosts. It doesn’t pay to have fancy switches that support L3 or L4 load balancing; a stupid host implementing destination-MAC-address-based load balancing can easily ruin your day.
I was listening to the HP Virtual Connect (VC) PPP podcast recently and got the impression that HP VC is a weirdly convoluted product. I started wondering what exactly they were thinking when they were designing it ... and had the epiphany when Ken Henault took a step back and explained the history leading to the current complexity (listen to the Packet Pushers podcast to get the whole story)
There are two reasons one would bundle parallel Ethernet links into a port channel (official term is Link Aggregation Group):
- Transforming parallel links into a single logical link bypasses Spanning Tree Protocol loop avoidance logic; all links belonging to the port channel can be active at the same time (see also: Multi-Chassis Link Aggregation basics).
- Load sharing across parallel links in a port channel increases the total bandwidth available between adjacent L2 switches or between routers/hosts and switches.
Ethan Banks wrote an excellent explanation of traditional port channel caveats (proving that 1+1 sometimes does not equal 2); things get way worse when you start using Multi-Chassis Link Aggregation due to hot potato switching (the switch tries to forward packets toward destination MAC address as soon as possible) used by all MLAG implementations I’m familiar with.
A few months ago VMware decided to kick away one of the more stubborn obstacles in their way to Data Center domination: the networking team. Their vCloud architecture implements VLANs, NAT, firewalls and a bit of IP routing within the VMware hypervisor and add-on modules ... and just to make sure the networking team has no chance of interfering, they implemented MAC-in-MAC encapsulation, making their cloudy dreamworld totally invisible to the lowly net admins.
The Internet Exchange and Peering Points Packet Pushers Podcast is as good as the rest of them (listen to it first and then continue reading), but also strangely relevant to the data center engineers. When you look beyond the peering policies, route servers and BGP tidbits, an internet exchange is a high-performance large-scale layer-2 network that some data center switching vendors are dreaming about ... the only difference being that the internet exchanges have to perform extremely well using existing products and technologies, not the shortest-path-bridging futures promised by the vendors.
This is my third MLAG post. You might want to read the Multi-chassis Link Aggregation Basics and Multi-chassis Link Aggregation: Stacking on Steroids posts before continuing.
Juniper has introduced an interesting twist to the Stacking on Steroids architecture: the brains of the box (control plane) are outsourced. When you want to build a virtual chassis (Juniper’s marketing term for stack of core switches) out of EX8200 switches, you offload all the control-plane functionality (Spanning Tree Protocol, Link Aggregation Control Protocol, first-hop redundancy protocol, routing protocols) to an external box (XRE200).
In the Multi-chassis Link Aggregation (MLAG) Basics post I’ve described how you can use (vendor-proprietary) technologies to bundle links connected to two upstream switches into a single logical channel, bypassing the Spanning Tree Protocol (STP) port blocking. While every vendor takes a different approach to MLAG, there are only a few architectures that you’ll see. Let’s start with the most obvious one: stacking on steroids.
It looks like everyone (and their dog) is writing about DCB and FCoE these days, but occasionally we get a fresh thought: Ron Fuller compares FC/Ethernet to East/West coast mentality. Great analogy ;)
Recently I’ve stumbled across a year-old post by James Ventre describing the reasons output rate on an Ethernet-type interface (as reported by the router) never reaches the actual interface speed. One of them: inter-frame/packet gap (IPG).
I was stunned ... I remember very well the early days of thick/thin coax Ethernet when the IPG was needed for proper carrier sense/collision avoidance detection (probability of a collision decreases drastically as you introduce IPG), but on a high-speed point-to-point full duplex link? You must be kidding.
If you ask any Data Center networking engineer about his worst pains, I’m positive Spanning Tree Protocol (STP) will be very high on the shortlist. In a well-designed fully redundant hierarchical network where every device connects to at least two devices higher in the hierarchy, you lose half the bandwidth to STP loop prevention whims.
Of course you can try to dance around the problem:
Ethan Banks has a great article @ PACKETattack: in Assembly Required – Interconnecting 2 Ethernet Chassis Switches he describes various options you have when you want to connect your redundant core switches. Using more than one physical link is the obvious choice; most people are careful enough to use at least two linecards, but the true magic begins when you start considering the bandwidth allocation to individual linecards and port groups within linecards.