Build the Next-Generation Data Center
6 week online course starting in spring 2017

IS-IS in Avaya’s SPB Fabric: One Protocol to Bind Them All

Paul Unbehagen made an interesting claim when presenting Avaya network built for Sochi Olympics during a recent Tech Field Day event: “we didn’t need MPLS or BGP to implement L2- and L3VPN. It was all done with SPB and IS-IS.”

Where’s the Magic?

IS-IS is a routing protocol that (like MP-BGP) supports multiple address families. We’ve used it to route IPv4 and IPv6 for over a decade, recently it got extended to support layer-2 routing with SPB and TRILL, and now Avaya is using it to transport L3VPN information.

The architecture of Avaya’s solution is really quite simple:

  • They use SPB with MAC-in-MAC encapsulation to build large layer-2 fabrics;
  • MAC-in-MAC encapsulation contains an I-SID field, which is sort-of equivalent to VLAN in 802.1q – you can use it to indicate L2VPN the encapsulated packet belongs to. Avaya uses that field to build L2VPNs on top of SPB fabric;
  • IS-IS is used within SPB fabric to build L2 topology database, but there’s no reason not to use that same IS-IS to build IP routing topology database. Avaya used that approach for several years to add (global) IP routing functionality to SPB fabric;
  • Recently Avaya added L3VPN functionality – new IS-IS TLVs to exchange VRF IP reachability information, and L3 forwarding based on I-SID field.

After SPB nodes exchange L2 and L3VPN reachability information (using regular IS-IS flooding) the packet forwarding follows (approximately) these steps:

  • Incoming packet is received by ingress fabric switch;
  • Destination MAC address of the incoming packet matches MAC address of switch’s IP interface – ingress fabric switch thus performs L3 lookup;
  • Incoming packet is encapsulated in MAC-in-MAC envelope (with I-SID indicating L3VPN) and sent to the fabric transport MAC address of the egress fabric switch;
  • Encapsulated packet is forwarded across the L2 SPB fabric to the egress switch based on the outer (fabric transport) MAC address
  • Egress switch receives the encapsulated packet, selects local VRF based on I-SID value, performs L3 forwarding in selected VRF, and forwards the packet toward its final destination.

For more information, read Avaya’s Shortest Path Bridging configuration guide.

Will It Scale?

Avaya’s L3VPN solution is architecturally (almost) equivalent to MPLS/VPN and thus scales better than Cisco’s Easy Virtual Network (EVN), which is really just syntactic sugar on top of VRF-Lite.

The proof is left as an exercise for the reader, the solution can be found in opening chapters of MPLS and VPN Architectures book.

Even though there are architectural similarities, Avaya’s solution remains far away from true scalability of MPLS/VPN:

  • SPB fabric runs a single-area IS-IS with all fabric members sharing the same topology database. The size of the fabric is thus limited by the weakest switch in the whole fabric;
  • IS-IS implementations were traditionally better than OSPF implementations (that’s why many large ISPs prefer IS-IS over OSPF), but that doesn’t mean that you can grow an IS-IS area indefinitely. A few hundred switches (for a pretty low value of few) is probably the largest fabric you can build;
  • The number of IP routes (carried as IS-IS TLVs) in enterprise networks is usually reasonably small, so I wouldn’t expect any scalability issues there. Furthermore, IS-IS considers the IP prefixes after the shortest tree has already been computed, so the computational complexity of IP route selection remains linear (O(n)).

Finally, BGP is a much richer protocol than IS-IS when it comes to routing policies. There are numerous arcane MPLS/VPN architectures that cannot be implemented with the simple L3VPN model Avaya is using. Admittedly, you wouldn’t find them in most enterprise networks.

Avaya’s SPB-based L3VPN implementation is pretty new, so tread carefully. For example, it seems route redistribution loops could cause major headaches (see configuration guide, page 173).

Summary

Avaya’s L3VPN solution seems a reasonable fit for enterprise networks that need L3 path separation (similar to the scenarios I described in Enterprise MPLS/VPN webinar), but I wouldn’t use it in large-scale service provider deployments.

Looking for a fabric solution?

Check out what the vendors are doing in the Data Center Fabric Architectures webinar (register for the live session), or find out how to build a Clos (leaf-and-spine) fabric. Both webinars are part of Data Center roadmap.

Full disclosure

As I was already attending Interop Las Vegas, I decided to spend a few hours listening to what Avaya and HP were ready to share with Tech Field Day delegates, and all I got in return was the opportunity to meet new people, two hours of nice conversations, and a cup of weak American coffee.

10 comments:

  1. "A few hundred switches (for a pretty low value of few) is probably the largest fabric you can build"

    That's a pathetic low number.
    In 1997-1998, iMCI had a backbone with 500 routers, running IS-IS in a single area. Those 500 routers were connected in a double full mesh over an ATM cloud. You can't get a worse/heavier topology for flooding. And it ran IS-IS just fine.

    Note, that were cisco 7000s and 7500s. Which had a Motorola 68040 processor in the control plane. That's a 25MHz cpu ! If you could do 500 routers in a flat area with 25MHz cpus, you should be able do much larger numbers today. I'm sure I could make IS-IS run with 10000 routers/switches, and a million endpoints. Today, with using modern cheap Intel CPUs. Why hasn't that been done yet ?

    BGP is a great protocol. I am less a fan of MPLS myself. (I never thought the added complexity outweighs the benefits). But some people in high places have always been in love with BGP and MPLS. A bit too much, imho. For them, MPLS is the hammer, and all problems look like nails.

    ReplyDelete
  2. The L2 forwarding part is what concerns me. I understand SPBm as building a shortest path tree per I-SID. Ok, that beats Spanning Tree. But then the actual forwarding between nodes at the endpoints of the tree is L2 forwarding, which requires flooding and (outer) MAC learning. Sort of L2 MAC-based tunneling. Whereas TRILL and FabricPath do routed tunnels between endpoints.

    I just can't help feeling that using IS-IS to build a better L2 path is a bit ... unnatural, self-defeating? I'm also thinking some of the negatives of H-VPLS probably apply as well, in slightly different form, e.g. failure response behavior.

    Your thoughts on this: pros / cons? Yes, I agree, BGP has proven scalability.

    ReplyDelete
    Replies
    1. The forwarding across the fabric is based on a routing protocol (IS-IS), although it does use MAC addresses and has no TTL.

      Flooding and C-MAC learning happens only at the edges, and I think you can use IS-IS for C-MAC distribution (similar to IP prefixes). Not sure about this one though.

      Delete
    2. Ludovico Stevens24 April, 2014 12:20

      I'd also like to reply to Pete Welcher's comments and concerns about the nature of L2 forwarding in the SPBM context. This reaction is quite common and understandable given the history of Ethernet bridging and its inseparable ties with Spanning Tree. However these concerns are unfounded, as well as the belief that TRILL/FabricPath would, in some way, be better than SPBM.

      Yes, an SPBM end-point node (BEB, Backbone Edge Bridge) will do MAC learning (and flooding) of end-user MAC Addresses, but only inside a Layer 2 VPN service (L2 Virtual Service Network – L2VSN – in Avaya’s language). The learning is done against Ethernet ports for traffic arriving from outside the SPBM Fabric (as is commonly done by all Ethernet bridging devices) but it is also now done against the originating Source Backbone MAC (BMAC) of distant BEB as the MAC-in-MAC traffic is decapsulated on its way out of the SPBM Fabric. This MAC learning/flooding happens only within the L2VSN service and only on the BEB nodes, not in the underlying SPBM Fabric (TRILL/FabricPath are essentially doing the same thing).

      Inside the SPBM Fabric (on BCB, Backbone Core Bridge, nodes) it's a completely different story, as there is no MAC learning and there is no MAC flooding. Here it is purely transport and the only reachability that counts is towards the BMAC Addresses, and IS-IS caters for this.

      I usually like to explain the parallel of how SPBM's IS-IS works in comparison to OSPF (which is more or less well understood by Enterprise folks).
      An OSPF Router does an SPF run on its copy of the OSPF LSDB and works out, from where it stands in the topology, the shortest path to every IP route; the results are then trimmed and used to populate its IP Routing table where the next-hop is the IP Address of the immediate next-hop towards that shortest path for that given IP route.
      An SPBM Switch does an SPF run on its copy of the SPBM IS-IS LSDB and works out, from where it stands in the topology, the shortest path to every other BMAC in the SPBM Fabric (if you have 100 nodes in the SPBM Fabric, then you have 100 BMACs advertised by IS-IS; not a particularly huge number); the results are then trimmed and used to populate the MAC table of 1 (or more) "special" Backbone VLAN (BVLAN), where the next-hop is the Ethernet port of the immediate next-hop towards that shortest path for that given BMAC. The BVLAN is "special" because MAC learning and flooding on this Backbone VLAN is switched off in hardware (obeying the Spanning Tree state of the Ethernet ports inside the SPBM Fabric - discarding/forwarding is also switched off in hardware). In short these BVLANs are essentially just a repository of MAC tables calculated and populated by IS-IS.

      So SPBM has the same properties you can expect from a well-designed OSPF backbone, where all the OSPF adjacencies are point-to-point (I say "well-designed", because in some cases OSPF networks are deployed over broadcast segments which then compromises the speed at which OSPF can react to failures). TRILL/FabricPath are conceptually the same and certainly no better than SPBM.
      The subtle differences between TRILL/FabricPath and SPBM are a tradeoff where the former can spray traffic over an unlimited number of equal cost paths, but it pays a heavy price by not being able to support 802.1ag CFM which is the industry standard Ethernet OAM. CFM brings crucial L2 ‘ping’ and L2 ‘trace route’ capabilities to SPBM.

      Also the fact that TRILL/FabricPath only uses a VLAN-ID and not a Service-ID (I-SID) means that the Avaya Layer 3 VPN (L3VSN) solution described in the main article above would simply not be possible there.

      Finally to come back to the case of the L3VSN (L3VPN) service type described in the article above, there is simply no MAC learning/flooding anywhere, because we are talking IP routing inside the service and SPBM MAC routing (as I described above) inside the SPBM Fabric.

      Delete
    3. Ludovico Stevens24 April, 2014 12:35

      And, as I ran out of characters in the above post…!

      So why use IS-IS to build a better L2..?
      In the case of SPBM I can list you these benefits for an Enterprise customer:

      a) As we are operating at L2, delivering L2 virtualized transport is a piece of cake (compare that with the complexity of MPLS + VPLS); TRILL/FabricPath are also good at this…
      b) As we can integrate the IP routing directly inside the SPBM Fabric, leveraging IS-IS, we don’t need external IP Routers and we hence can extend SPBM's benefits and simplicity end-end…
      c) As we support L3VSNs (L3VPNs), we can easily do Enterprise-grade L3 virtualization; compare that with VRF-Lite or MPLS BGP IPVPNs…
      d) What we can do with IPv4, we can do equally well with IPv6, with the same IS-IS instance (just a different set of TLVs). Consider the pain of having to run both OSPFv2 for IPv4 and OSPFv3 for IPv6 in traditional dual-stack designs
      e) SPBM's use of IS-IS delivers something extra, which neither OSPF or TRILL/FabricPath can do; Service-specific Multicast delivery trees; so you can easily enable IP Multicast for any of the above services (a), (b), (c)... Now, try doing that with VPLS, for each and every service, and try doing that with BGP IPVPNs over MPLS; in all cases you don't have to get your hands dirty with PIM-SM; just one instance of IS-IS in the Core and IGMP in the Access…
      f) Cherry on the cake, consistent and sub-second failover for all of the above; try doing that with BGP-based services, or even worse with PIM-SM

      Delete
  3. Ludovico Stevens18 April, 2014 21:19

    Since I have a good understanding of the Avaya SPBM solution I'd just like to add some complementary information to a couple of points Ivan raised.

    > BGP is a much richer protocol than IS-IS when it comes to routing policies. There are numerous arcane MPLS/VPN architectures that cannot be implemented with the simple L3VPN model Avaya is using. Admittedly, you wouldn’t find them in most enterprise networks

    This is true. We are however enhancing the way we use the L3VSN (L3VPN) services to be able to emulate some of that capability.
    For example we will soon have the ability to selectively import VPN routes associated with other I-SIDs from the one of the importing VPN. I.e. this effectively will give us the same capability that BGP has with using multiple import RTs to accept VPN routes from other IPVPNs. You can therefore replicate some of the more complex BGP IPVPN architectures.
    A common design objective for this in the enterprise space, is to segments users into distinct L3 routing domains (L3VSNs) so that they cannot communicate with one another, but at the same time we want those separate user groups (routing domains) to be able to access shared servers or shared Firewalls, which are located in a 3rd routing domain to this effect.

    > Avaya’s SPB-based L3VPN implementation is pretty new, so tread carefully. For example, it seems route redistribution loops could cause major headaches (see configuration guide, page 173).

    This is also true. Correctly configuring route redistribution between 2 different IP routing protocols (using 2 or more border routers) needs to be done with care.
    However that particular challenge described in the configuration guide, is no different from the challenges we used to face when migrating RIP networks to OSPF in days gone past.

    ReplyDelete
  4. Ludovico, Thank you for your detailed answers...do you have any good links to share to learn more details about the technology? Thanks!

    ReplyDelete
  5. As of today, is there a way to assign cost to interfaces on a SPB network? How could it handle unequeal paths? 1G,10G,40G?

    ReplyDelete
  6. Thanks for the great post. I still dont understand if you run IS -IS between BEBs as opposed to MP-BGP in MPLS how you going to:

    deal with the same prefixes from different customers (RD in BGP)
    what if customer wants to use BGP for CE - BEB within the enterprise for complex path selection BGP can offer?
    thanks

    ReplyDelete
  7. I just don't understand why that is "Betamax"...
    They solve the problem that PBB had with ISIS as control plan
    That is much better approach den layers of protocols to do the same L2 Circuit

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.