Your browser failed to load CSS style sheets. Your browser or web proxy might not support elliptic-curve TLS

Building network automation solutions

9 module online course

Start now!
back to overview

The EVPN Dilemma

Got an interesting set of questions from a networking engineer who got stuck with the infamous “let’s push the **** down the stack” challenge:

So I am a rather green network engineer trying to solve the typical layer two stretch problem.

I could start the usual “friends don’t let friends stretch layer-2” or “your business doesn’t really need that” windmill fight, but let’s focus on how the vendors are trying to sell him the “perfect” solution:

One thing I hear over and over from everyone (vendors especially) is how EVPN will solve all of my problems.

Every now and then vendors go on a lemming run promoting a miraculous technology. A few years ago it was either TRILL or SPB (depending on which chipset you were trying to sell), now it’s EVPN… which is a shame because EVPN is a decent technology.

The “solving all your problems” is the necessary component of this fairy tale. You would never buy from a vendor who would drop by and say “we can solve one of your problems, and you have to restructure your applications to get rid of the other 100”, right?

All I need to do is ditch my current IGP in favor of BGP

Another lemming run, this time along the lines of “if Petr Lapukhov did it at Microsoft it must be good”. While you could get a pretty minimalistic and simple design if you make BGP the only routing protocol in your fabric, you better do that with an implementation that was adapted to the new use of BGP, not decades-old code base that needs a gazillion of tweaks and just the right values of nerd knobs to make it work.

Oh, and some vendors messed their implementation really badly, so they started promoting IBGP-over-EBGP (EVPN address family on IBGP sessions running between loopbacks advertised with IP address family on EBGP sessions running on point-to-point links) and using schizophrenic local-as mechanisms just to make it work. Oh, and then there was another vendor telling the customers to run EBGP sessions on point-to-point links to exchange loopback prefixes, and another set of multihop EBGP sessions between the loopback interfaces of the same boxes to exchange the EVPN prefixes.

… and well BGP is hard to configure so I also need to invest in an automation solution.

That’s another thing vendors are really good at - promoting the right stuff for the wrong reasons. Network automation is the right way to go, but if it’s sold as the only way to build BGP configurations for your data center fabrics (because of the copious amount of nerd knob settings you need) you chose a wrong vendor.

There are vendors focusing on making data center EVPN+BGP+MLAG configurations as simple as possible, but they lack the marketing muscles of the big guys and glitzy customer events that CIOs love to mingle at. Just saying…

It's also worth mentioning that most open-source BGP products like BIRD and goBPG support similar BGP configuration simplifications as FRR, so it's obviously not that hard to implement them.

One other thing… EVPN doesn’t play well between vendors so there’s probably going to be lock in.

Well, the vendors are telling me they’re running interoperability workshops making sure the least-common-denominator EVPN implementations interoperate… but honestly, why would you want to build your data center fabric with switches from two vendors?

Unless you’re a member of the FANG club (in which case you’d probably run your own software on top of standardized products from two sources anyway), you’ll probably lose more money than you saved dealing with operational complexity of running two platforms with two operating systems (I would, however, avoid using proprietary vendor features as much as possible). It’s like mixing AIX, Solaris and Linux in your servers. Who would ever want to do that unless a database company forces them to do it due to their licensing and litigation practices?

Oh and your current network equipment will need to be replaced as well.

Like when you’re trying to figure out whether to buy a new car, you have two options:

  • Stick with the old stuff and live with the lack of features available in the new models;
  • Invest in the new model and get the new features.

Funnily, if you happen to have a decent-sized installation under vendor support contract, it might be cheaper to ditch the old stuff and buy the new switches. We had customers that would make money just on that swap in a few years’ time due to cheaper boxes and consequently lower support costs.

What’s the problem with a solution like GRE? I can leverage my current IGP, all of my equipment already supports it… and it works between vendors.

While there are plenty of vendors doing whatever-over-GRE (but maybe not on recent data center switches), and I'm told at least some Broadcom ASICs support NVGRE (but how would we know), I’m not aware of anyone shipping bridging-over-GRE in hardware, and if you plan to stretch your layer-2 domain over 100 Mbps or 1 Gbps link (so you could use software-based forwarding), I have just one word for you: DON’T.

The question does make perfect sense though once you manage to replace GRE with VXLAN (see below).

Maybe trying to “tunnel” away all of our problems is the wrong solution to begin with. What are your thoughts on this?

There’s always RFC 1925 Rule 6A, but in the case of layer-2 segments artificially stretched beyond recognition (= beyond a single cable) tunneling makes perfect sense.

You could either try to bend the laws of physics and make bridging-with-STP work in an environment it was never designed for (what data center vendors tried to do with large-scale MLAG using proprietary technologies like VSS, vPC, IRF, VCF…), or you could give up, realize a routed fabric will always be more stable than a bridged hodgepodge, and start looking for a way to implement one.

In theory, you could build a routed fabric using MAC addresses (SPBM), yet-another layer-3-protocol (TRILL), or IP (VXLAN). I would go for VXLAN as we’ve been debugging IP routing protocols and IP forwarding for decades and thus they tend to work pretty well.

You could be smart and use VXLAN with preconfigured flooding lists and dynamic MAC learning (and I know people doing that in large-scale environments with great success) or you could buy into another vendor fairy tale that VXLAN with EVPN solves every problem you ever had.

Yet again, I’m not saying that EVPN is a bad technology, or that you wouldn’t benefit from using it (it might come very handy in larger fabrics, or if you still insist on stretching the VLANs across WAN links), but in some cases the simplest solution is all you need, and VXLAN on top of whatever IP routing protocol you’re familiar with (even RIP would work) gets you pretty close to that goal.

More Information

You might find these webinars (part of ipSpace.net subscription) useful if you want to master the technologies I mentioned in this blog post:

All these webinars and much more are included in our Building Next-Generation Data Center online course.

You’re also most welcome to join us in the Using VXLAN with EVPN to Build Active-Active Data Centers workshop in Zurich on Tuesday, December 3rd 2019.


Many thanks to Dinesh Dutt and Nicola Modena for fact-checking and improving the blog post.

Please read our Blog Commenting Policy before writing a comment.

11 comments:

  1. And of course the issues with MTU mismatch when doing any sort of tunneling. All the VMs will need changes to communicate efficiently.

    ReplyDelete
    Replies
    1. Nobody sane goes down that rabbit hole. The only realistic option is to increase the underlay (transport) MTU.

      Delete
    2. Not sure if you meant jumbo frames:
      https://netcraftsmen.com/just-say-no-to-jumbo-frames/

      Delete
    3. There are two reasons for jumbo frames:

      * Because some people believe they increase TCP/IP throughput (usually not true unless you're dealing with suboptimal TCP stacks)
      * Because you don't want to deal with client MTU size in tunneling environments.

      And yes, I completely agree with everything Peter wrote, but sometimes you have to choose the lesser of two evils.

      Delete
    4. Yes, sure. It is much better than other tricks like aligning hosts with smaller mtu, pmtud, mss adjustment, and so on. Like every other tool has its good & bad sides.

      Delete
  2. Honestly the future looks to be running some IGP on the servers themselves so they can keep their IP and can move anywhere without issue. Of course if your stupid app needs L2 to another host you are always going to be screwed.

    ReplyDelete
  3. Disagree, you are just changing a default value, and it doesn’t harm toy in any way...if you can’t keep your configs in sync with your intent (larger but consistent MTU on EVERY link) you have got bigger problems...

    ReplyDelete
  4. This is where routing protocols like IS-IS can shine. Leveraging it for link discovery, link mtu validation, etc... can be advantageous. I like having choices for IGP's, but if we build solutions that only converge on one routing protocol isn't that harmful?

    ReplyDelete
    Replies
    1. That's probably why it was used in Fabricpath, unfortunately the best tech isn't always the prevailing candidate.

      Delete
  5. Pretty much all EVPN implementations support multiple routing protocols. Including IS-IS, OSPF and EBGP as IGP. There’s a difference between a reference design and feature support. Also, with all due respect, there are statements here that are hearsay, or at best very old news, and so do unnecessary harm. This stuff is mostly software and software is not static.

    ReplyDelete
    Replies
    1. Hi Aldrin,

      "Pretty much all EVPN implementations support multiple routing protocols." << Correct. That's not necessarily what the vendor SEs are telling the customers.

      "There’s a difference between a reference design and feature support." << Agree. One of the differences is how many bugs you'll encounter when using a supported feature that is rarely used.

      "there are statements here that are hearsay" << OK, tell me more. Would love to hear what you consider hearsay (is it hearsay if a customer tells me how badly he was burned?) and very old news.

      I'm guessing you know how to contact me directly if you want to take the discussion offline ;)

      Kind regards, Ivan

      Delete

Constructive courteous comments are most welcome. Anonymous trolling will be removed with prejudice.

Sidebar