Why Is Stretched ACI Infinitely Better than OTV?

Thursday, September 1, 2016 07:24 +0200

Why Is Stretched ACI Infinitely Better than OTV?

Eluehike Chedu asked an interesting question after my explanation of why stretched ACI fabric (or alternatives, see below) is the least horrible way of stretching a subnet: What about OTV?

Time to go back to the basics. As Dinesh Dutt explained in our Routing on Hosts webinar, there are (at least) three reasons why people want to see stretched subnets:

Service or node discovery based on broadcasts;
Multicast cluster heartbeats;
Assumptions of servers being in the same subnet (including IP address mobility and VM mobility).

While there’s not much one could do about the first two (apart from enabling IP multicast in the data center fabric), there are two ways of solving the third one:

The wrong way: stretched VLAN;
The right way: admitting that the IP subnet paradigm doesn’t fit all environments and going back to routing based on host identifiers (hint: CLNS got there decades ago).

What’s the difference between the two approaches? The stretched VLAN approach uses the wrong forwarding paradigm (panic-and-flood when you don’t know) that was invented to emulate a yellow coax cable. The routing on host identifiers approach is still routing (drop when you don’t know) but using more granular forwarding table.

You might have noticed I said host identifiers and not IP addresses. It really doesn’t matter that much if you do routing based on MAC or IP addresses as long as it’s deterministic and there’s no flooding. Figuring out why it still matters whether you use MAC or IP addresses is left as an exercise for the reader ;)

Various VLAN extension approaches like OTV are just lipstick on a pig. They have to use all sorts of tricks to fix the problems caused by using the wrong forwarding behavior (bridging):

First-hop gateway selection (otherwise you get traffic trombones);
Suboptimal ingress traffic problems;
Excessive flooding across lower-speed links;
Unnecessary unicast flooding.

You don’t get any of these problems when using routing based on host identifiers:

First-hop gateway is always the first network device.
Forwarding fabric already contains host routes, which can be redistributed into external routing protocol to get optimal ingress traffic flow (note: I’m not saying that’s a good idea).
There’s no flooding, and ARP/ND/IGMP requests are terminated on the first-hop network device.

The “only” problem left for the host routing fabrics to solve: identifying the correct host identifiers (there’s a reason CLNS had ES-IS protocol). Most solutions misuse ARP requests to identify host IP addresses, or glean host IP addresses straight from data packets. VMware makes it even more interesting with their incredibly shortsighted decision to use RARP instead of ARP to signal VM move.

Is Cisco ACI the only fabric that works this way? Absolutely not. You have plenty of choices:

Avaya fabric
Cisco ACI
Cisco DFA
EVPN with symmetrical IRB (asymmetrical IRB still uses too much bridging), for example on Cisco Nexus switches
Cumulus Linux Redistribute ARP
Enterasys (now Extreme Networks) host routing.

We covered this idea in detail in the Leaf-and-Spine Fabric Designs webinar, but if you need just an overview, watch my IPv6 microsegmentation Troopers talk or IPv6 microsegmentation webinar.

Recent posts in the same categories

design

data center

WAN

11 comments:

Unknown 01 September 2016 10:48

OTV relates to extending L2 applications across distributed DCs, but EVPN does not realistically support that specific feature.
However, there is a work in progress in the IETF to allow support for all requirements when interconnecting EVPN DCs, i.e. "Multi-Site EVPN": https://tools.ietf.org/html/draft-sharma-multi-site-evpn-01.

Ryan 01 September 2016 18:31

This comment has been removed by the author.

Replies

Ryan 01 September 2016 18:34

hi Ivan,

Appreciate your insight on issues like these, as always. Of note to me was your comment that "Most solutions misuse ARP requests to identify host IP addresses...", to which I have struggled with myself when deploying these things.

For example, a limitation of LISP ESM in the past for me has been silent hosts, and I believe that Cumulus redistribute ARP suffers from a similar pain point. It seems that speak-when-spoken-to hosts (cluster IPs, VIPs), for example, require contingency plans and workarounds, sometimes painful, to deploy these solutions.

I personally have not heard of any movements to try and deal with this by alternate fabric discovery mechanisms, but would be curious to hear. For me, this is one of the major stumbling points toward it being viable and not a nightmare to deploy.

Ivan Pepelnjak 02 September 2016 20:02

As always, we're stumbling upon exceptions instead of focusing on solving 95% of the problem ;).

However, for VIP addresses other hosts need to reach them, so they'll ARP and the fabric can capture the ARP reply (not sure which solutions do that though).

Ryan 08 September 2016 15:19

Correct -- I do not think redistribute ARP or LISP ESM will ARP for the destination if it is unknown (might also be what Pavel is referring to.) Last I checked, it requires destination to ARP first and become discovered. If no preemptive ARP, host is not known on the fabric and therefore unreachable.

Unknown 07 September 2016 12:55

Somehow i still do not get, maybe because i do not have experience with routing host identifiers.
For me if I have a 2 host on the same subnet in both datacenters, it still means the failure domain is the same, no matter what technical way i achieve it (stretch a vlan or use "routed l2"ACI). The reason is that any host1 NIC failure/misconfig will result in flood to host2.
Or....are you saying that unknown dst mac does not get flooded. We canot have that, half of the apps would stop working....

Replies

Ivan Pepelnjak 07 September 2016 13:10

"For me if I have a 2 host on the same subnet in both datacenters, it still means the failure domain is the same" <-- not if you're not bridging between them.

"are you saying that unknown dst mac does not get flooded." <-- ideally NOTHING gets flooded.

"We cannot have that, half of the apps would stop working" <-- I don't believe that any more

Also, as I wrote, I was focused only on IP address mobility, not on supporting even-more-broken stupidities.

Unknown 20 September 2016 12:10

Thanks Ivan, that helped. So....trick question:
"how is the routing-l2 forwarding behavior different from having switchport block unicast on all server ports"

Is this an example of the lipstick-on-a-pig?

Replies

Ivan Pepelnjak 20 September 2016 20:48

It's actually routing on IP addresses not on MAC addresses, and not only does it stop unicast flooding, it also stops (when properly implemented) all ARP broadcasts / ND multicasts.

Bob McCouch 22 September 2016 16:09

This is great info Ivan (as always).

A challenge I still encounter regularly is that for many mid-size and smaller companies, the cost/complexity of building a fabric like you're describing often ends the conversation before it's really begun. I still find cases where OTV, for example, is certainly better than just trunking L2 across a DCI, and somewhat more approachable than the technologically superior alternatives.

Which technologies would you consider most appropriate when operational complexity is taken into consideration?

Replies

Ivan Pepelnjak 22 September 2016 20:57

Let's start with "What would you recommend them as a fabric within the data center?" and "How big would that fabric be?"

Add comment