Bridging and Routing, Part II

Based on the readers’ comments on my “Bridging and Routing: is there a difference? post (thanks you!), here are a few more differences between bridging and routing:

Cost. Layer-2 switches are almost always cheaper than layer-3 (usually combined layer-2/3) switches. There are numerous reasons for the cost difference, including:

  • Mass-market low-end switches are usually simple bridges. Low-cost high-speed bridging silicon is thus readily available.
  • MAC address lookup is simpler than IP table lookup and easier to implement in silicon. You need simple CAM (Content Addressable Memory) to perform MAC address lookup and TCAM (Ternary CAM) with additional output logic to perform longest-IP-prefix matching.
  • Layer-3 switches are expected to perform IP packet filtering. Implementing access lists in hardware (usually with even larger TCAM) is expensive.

Zero configuration. In their simplest incarnation, the bridges are plug-and-play devices (magically transforming themselves into plug-and-pray devices as the network grows); it’s quite easy to find a perfectly working switch named Switch with no non-default configuration in a badly managed network. Routers always require configuration (at the very minimum, you have to configure IP subnets and IP routing protocols).

However, as soon as VLANs are introduced into the network or you need to fine-tune STP, the zero-configuration benefits are gone.

Equal-cost multipath. Routers can load-balance traffic between equal-cost paths across the network. Bridges can load-balance traffic between parallel bonded links (port channel). Redundant paths in bridged networks are disabled to prevent forwarding loops.

Enhancements to port channel technology (VSS and vPC) allow links connected to multiple switches to be bonded. TRILL (and similar technologies) solves the problem, allowing unrestricted equal-cost multipath.

Security. Packet filters between IP subnets are a standard feature of every decent router, allowing the network designer to segment the network into security zones.

Some layer-2 switches have similar functionality (port ACL), which turns a L2 switch into a layer-3-aware L2 device, increasing configuration and troubleshooting complexity.

Predictability. L3 forwarding tables are modified only by the control plane (routing) protocols based on messages exchanged by the routers, not by the data traffic flow. L2 forwarding tables are modified on-the-fly by the data plane snooping functionality based on source MAC addresses in the frames forwarded by the switch.

Troubleshooting. It’s impossible to troubleshoot a bridged network from an end-host; the network is designed to be invisible. The error reporting mechanisms built into most L3 protocols allow an end-host to trace a path across the network, giving the network operator at least an initial snapshot of the network conditions and a troubleshooting starting point.

End-host mobility. The source MAC address snooping (which makes the bridged networks less predictable) allows instant host mobility – as soon as the host is attached to another network segment and sends a broadcast (a gratuitous ARP is a perfect candidate), all bridges readjust their L2 forwarding tables.

You can implement seamless host mobility in a routed network, but the delay is much higher, as the dissemination of changed information is done by the routing protocol.

Impact of link failure. Link failure in a routed network results in temporary loss of traffic forwarded over that link (until routing protocol convergence). Link failure in a bridged network running STP can impact unrelated parts of the network.

TRILL uses a routing protocol (IS-IS); a network built with TRILL RBridges behaves like a routed network.

Impact of physical errors. Most layer-3 routing protocols detect unidirectional links and wiring errors (which usually result in subnet mismatch errors). The same conditions can easily result in a forwarding loop in a bridged network, unless you use UDLD and bridge assurance.

TRILL and other similar technologies no longer have this problem, as they use a routing protocol inside the network.

Impact of network overload. When a L2 switch is overloaded to the point where it stops sending STP packets (for example, due to data plane overload impacting control plane functionality), remote switches might unblock their ports, resulting in a forwarding loop and a total network meltdown.

When a router stops sending routing protocol hello packets, other routers detect a dead neighbor and recomputed the network topology (not necessarily resulting in a working network, but at least they’re not aggravating the problem).

Bridge assurance solves this issue, as does TRILL.

Size of fault domain. Whole bridged network is a single fault domain (a fault anywhere in the network can impact the rest of it). A fault domain in a routed network is a single subnet.

The fault domain issue is usually related to the behavior of STP, but extends to the forwarding plane as well. A single misbehaving host attached to a bridged network can affect the whole network.

Anything else? Have I still missed something? Leave a comment!

Comprehensive overview of routing-versus-bridging in a data center is part of my Data Center 3.0 for Networking Engineers webinar (buy a recording or yearly subscription). The roles of bridging and routing in modern Service Provider networks are described in the Market trends in Service Provider networks webinar.

6 comments:

  1. Hi Ivan,

    I've tried to pay some attention to the TRILL technology (and went through Radia Perlman's web video you kindly pointed to a few days ago), and I do think that in the urge to please everyone on the RFC definition, there might still be a door open regarding the impact of physical errors.
    In fact for what I've understood the IS-IS adjacency between RBridges can cross non-TRILL enabled segments, making it somewhat different than Cisco's proposal of L2MP, but more universal.
    As such I believe that if you happen to have a problem on the cloud between RBridges, knowing that that cloud will "fail open" (they are L2-based switches running some sort of STP), the risk for hello packets to flow through a storm caused by unidirectional links for instance may occur, but the network will have the erratic behaviour that we've all seen during STP meltdowns.
    I'm perhaps pointing to a very corner case (I don't really think that ISIS hello's would be able to go through data plane of "cloud" switches maintaining ISIS adj for a while), but I'd like, if you will, to have your comment/clarification on that.

    thank you very much

    Gustavo Novais

    ReplyDelete
  2. Dmitri Kalintsev23 July, 2010 00:08

    Hi Ivan,

    I think Ethernet OAM&PM is worth mentioning in the troubleshooting section. 802.3ah, 802.1ag and Y.1731.

    ReplyDelete
  3. Ivan Pepelnjak23 July, 2010 12:12

    Well, the idea that you could sparsely deploy TRILL bridges across your network and that they should handle whatever is thrown at them is (in my personal opinion) pure b******t that made TRILL significantly more complex than it should have been.

    If someone decides (in his infinite wisdom) to deploy STP-based switched network between TRILL RBridges, he'll suffer exactly the same consequences as someone deploying STP-based switched network between IS-IS routers. It works today and it will work with TRILL, but you'll experience interesting nightmares in both cases.

    Obviously just my €0.002, I have no hands-on TRILL experience (but neither has anyone else).

    ReplyDelete
  4. Ivan Pepelnjak23 July, 2010 12:13

    Very good point. Will amend the text. I "just" need to study the relevant standards first ... but they tend to be written in an obfuscated language designed to scare away the uninitiated ;)

    ReplyDelete
  5. Petr Lapukhov23 July, 2010 18:01

    Well, it is worth mentioning that these are more of "Carrier Ethernet" extensions, aiming at providing "connection-oriented" services (EVCs, etc). Not exactly the connectivity model that enterprise data-centers are looking for (plug-and-play, full-mesh), rather SP-centric model.

    ReplyDelete
  6. Guillermo Ibañez21 August, 2011 18:28

    The above is an insightful comparison of bridging versus routing.
    But what is really compared is the CURRENT bridging concept using spanning tree versus routing. Things (and minds) are changing (slowly, as the dominant paradigm is that link state routing at layer two is almost perfect and some people think that no further evolution of bridging is possible beyond link state routing).
    But the concept of bridging is evolving (see ARP Path aka Fastpath proposals at IEEE 802.1 repository, IEEE Communication Letters July 2011, HPSR 2011 conference and demos (Sigcom 2011, LCN 2010) so that while some characteristics of bridging persist, like guessing vs calculating, no predictability of the path, other disadvantages dissapear: shortest paths are obtained, all links can be used, link failure does not affect working links.
    Simpler, reliable and powerful bridging is possible.

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.