IP routing « ipSpace.net blog

Tuesday, August 28, 2012 06:38 +0200

Is Layer-3 Switch More than a Router?

Very short answer: no.

You might think that layer-3 switches perform bridging and routing, while routers do only routing. That hasn’t been the case at least since Cisco introduced Integrated Routing and Bridging in IOS release 11.2 more than 15 years ago. However, Simon Gordon raised an interesting point in a tweet: “I thought IP L3 switching includes switching within subnet based on IP address, routing is between subnets only.”

Layer-3 switches and routers definitely have to perform some intra-subnet layer-3 functions, but they’re usually not performing any intra-subnet L3 forwarding.

Mobile ARP in Enterprise Networks

Keith sent me a set of Mobile ARP questions, starting with “What’s your view on using Mobile ARP in a large enterprise?”

Short summary: Mobile ARP is an ancient technology that was designed to solve a problem that disappeared with the deployment of DHCP. Now, let’s look at the bigger picture.

IRS – just what the SDN Goldilocks is looking for?

Most current SDNish tools are too cumbersome for everyday use: OpenFlow is too granular (the controller interacts directly with the FIB or TCAM), and NETCONF is too coarse (it works on the device configuration level and thus cannot be used to implement anything the networking device can’t already do). In many cases, we’d like an external application to interact with the device’s routing table or routing protocols (similar to tracked static routes available in Cisco IOS, but without the configuration hassle).

Does Optimal L3 Forwarding Matter in Data Centers?

Every data center network has a mixture of bridging (layer-2 or MAC-based forwarding, aka switching) and routing (layer-3 or IP-based forwarding); the exact mix, the size of L2 domains, and the position of L2/L3 boundary depend heavily on the workload ... and I would really like to understand what works for you in your data center, so please leave as much feedback as you can in the comments.

All MTUs are not the same

Matthew sent me the following remarkable fact (and he just might have saved some of you a few interesting troubleshooting moments):

I was bringing up an OSPF adjacency between a Catalyst 6500 and an ASR 9006 and kept getting an MTU mismatch error. The MTU was set exactly the same on both sides. So I reset them both back to default (1500 on the 6500 and 1514 on the ASR 9006) and the adjacency came back up, even though now the MTU is off by 14 bytes. So I attempted to bump the MTU up again, this time setting the MTU on 6500 to 1540 and the MTU on the ASR 9006 to 1554. Adjacency came right up. Is there something I am missing?

The 14 byte difference is the crucial point – that’s exactly the L2 header size (12 bytes for two 6-byte MAC addresses and 2 bytes for ethertype). When you specify MTU size on the IOS classic (either with the ip mtu command or with the mtu command), you specify the maximum size of the layer-3 payload without the layer-2 header. Obviously IOS XR works differently – there you have to specify the maximum size of a layer-2 frame, not of its layer-3 payload (comments describing how other platforms behave are most welcome!).

see 8 comments

Friday, February 11, 2011 09:53 +0100

Local Area Mobility (LAM) – the true story

Every time I mention that Cisco IOS had Local Area Mobility (LAM) (the feature that would come quite handy in today’s virtualized data centers) more than a decade ago, someone inevitably asks “why don’t we use it?” LAM looks like a forgotten step-child, abandoned almost as soon as it was created (supposedly it never got VRF support). The reason is simple (and has nothing to do with the size of L3 forwarding tables): LAM was always meant to be a short-term kludge and L3 gurus never appreciated its potentials.

Layer-3 gurus: asleep at the wheel

I just read a great article by Kurt (the Network Janitor) Bales eloquently describing how a series of stupid decisions led to the current situation where everyone (but the people who actually work with the networking infrastructure) think stretched layer-2 domains are the mandatory stepping stone toward the cloudy nirvana.

It’s easy to shift the blame to everyone else, including storage vendors (for their love of FC and FCoE) and VMware (for the broken vSwitch design), but let’s face the reality: the rigid mindset of layer-3 gurus probably has as much to do with the whole mess as anything else.

How Did We Ever Get Into This Switching Mess?

If you’re confused about the numerous meanings of a switch, you’re not the only one. If you wonder how the whole mess started, here’s the full story (from the biased perspective of a grumpy GONER):

In the early 1980s, there were no bridges or routers. Hosts communicated directly with each other or used intermediate nodes (usually hosts, sometimes dedicated devices called gateways) to pass traffic. Networking engineers’ lives would have remained simple were it not for a few overly bright engineers at DEC who decided their application (LAT) would run directly on layer 2 to make it faster.

Their company imploded (actually, it was sold in pieces) in the previous millennium, but their eagerness to cut corners still haunts every one of us.

Time-Based Static Routes

Before someone accuses me of being totally FCoE/DCB-focused, here’s an interesting EEM trick. Damian wanted to have time-dependent static routes to ensure expensive backup path is only established during the working hours. I told him to use cron with EEM to modify router configuration (and obviously lost him in the acronym forest)... but there’s an even better solution: use reliable static routing and modify just the track object’s state with EEM.

vMotion: an elephant in the Data Center room

A while ago I had a chat with a fellow CCIE (working in a large enterprise network with reasonably-sized Data Center) and briefly described vMotion to him. His response: “Interesting, I didn’t know that.” ... and “Ouch” a few seconds later as he realized what vMotion means from bandwidth consumption and routing perspectives. Before going into the painful details, let’s cover the basics.

Virtual aggregation: a quick fix for FIB/TCAM overflow

Quick summary for the differently-attentive: virtual aggregation solves TCAM overflow problems (high-level description of how it works).

During the Big Hot and Heavy Switches podcast, Dan Hughes complained that the Nexus 7000 switch cannot take the full BGP table. The reason is simple: it’s TCAM (FIB) has only 56.000 entries and the BGP table has almost 350.000 routes.

Nexus 7000 is a Data Center switch, so the TCAM size is not really a limitation (it would usually have a default route toward the WAN core), but the same problem is experienced by Service Providers all over the world – the TCAM/FIB size of their high-speed routers is limited.

RIBs and FIBs (aka IP Routing Table and CEF Table)

Every now and then, I’m asked about the difference between Routing Information Base (RIB), also known as IP Routing Table and Forwarding Information Base (FIB), also known as CEF table (on Cisco’s devices) or IP forwarding table.

Let’s start with an overview picture (which does tell you more than the next thousand words I’ll write):

TCP/IP is like a mainframe ... you can’t change a thing

Almost 30 years ago, I was lucky enough to work on one of the best systems of those days, VAX/VMS (BTW, it was able to run 30 interactive users in 2 MB of main memory), which had everything we’d wished for – it was truly interactive with hierarchical file system and file versioning (not to mention remote file access and distributed clusters). I couldn’t possible understand the woes of IBM mainframe programmers who had to deal with virtualized 132-column printers and 80-column card readers (ironically running in virtual machines that the rest of the world got some 20 years later). When I wanted to compile my program, I started the compiler; when they wanted to do the same, they had to edit a batch job, submit the batch job (assuming the disk libraries were already created), poll the queues to see when it completed and then open the editor to view the 132-column printout of compiler errors.

After a long discussion, I started to understand the problem: the whole system was burdened with so many legacy decisions that still had to be supported that there was nothing one could do to radically change it (yeah, it’s hard to explain that to a 20-year old kid full of himself).

Bridging and Routing, Part II

Based on the readers’ comments on my “Bridging and Routing: is there a difference?” post (thanks you!), here are a few more differences between bridging and routing:

Cost. Layer-2 switches are almost always cheaper than layer-3 (usually combined layer-2/3) switches. There are numerous reasons for the cost difference, including:

Bridging and Routing: Is There a Difference?

In his comment to one of my TRILL posts, Petr Lapukhov has asked the fundamental question: “how is bridging different from routing?”. It’s impossible to give a concise answer (let alone something as succinct as 42) as the various kludges and workarounds (including bridges and their IBM variants) have totally muddied the waters. However, let’s be pragmatic and compare Ethernet bridging with IP (or CLNS) routing. Throughout this article, bridging refers to transparent bridging as defined by the IEEE 802.1 series of standards.

Design scope. IP was designed to support global packet switching network infrastructure. Ethernet bridging was designed to emulate a single shared cable. Various design decisions made in IP or Ethernet bridging were always skewed by these perspectives: scalability versus transparency.

Category: IP routing