Data Center Routing with RIFT on Software Gone Wild
Years ago Petr Lapukhov decided that it’s a waste of time to try to make OSPF or IS-IS work in large-scale data center leaf-and-spine fabrics and figured out how to use BGP as a better IGP.
In the meantime, old-time routing gurus started designing routing protocols targeting a specific environment: highly meshed leaf-and-spine fabrics. First in the list: Routing in Fat Trees (RIFT).
In Software Gone Wild Episode 88 we sat down with Dr. Tony Przygienda, author of RIFT, and Jeff Tantsura, chair of the RIFT IETF working group.
We started with tons of background topics:
- Do we even have a problem?
- Why is BGP not good enough, and why do we need another routing protocol?
- What are the big players doing?
- Why can’t we use OSPF or IS-IS in large highly meshed fabrics?
- Do we really need (transport) policies and traffic engineering in data centers, or is it better to buy more bandwidth?
- Is it worth solving the problems in the network, or should they be solved on the hosts?
After wasting 20 minutes on describing the problem we finally got to the interesting stuff:
- What is RIFT? What environments is it supposed to be working in?
- How can you combine the benefits of link-state and distance-vector technologies in the same routing protocol?
- How RIFT uses automatic deaggregation to avoid black holes caused by aggressive summarization
- What has RIFT borrowed from other routing protocols and what’s unique?
- RIFT is schema-based (not TLV-based) protocol. What does it mean and why does it matter?
- Why is RIFT running on top of UDP and not using a separate Ethertype like IS-IS
- How is flooding implemented in RIFT and what are flooding scopes?
- Why is directionality (east-west versus north-south) so important to RIFT?
- What happens when your data center fabric has leaf-to-leaf shortcuts?
- How does RIFT figure out the position of an individual switch (leaf or spine) within the fabric?
- How can you use key-value store embedded in RIFT to implement zero-touch provisioning?
Interested? You’ll find all the details in Episode 88 of Software Gone Wild.
We hope it will lead to experimentation/feedback/demand/open-source attempts and detract a bit from the marketing and assumption-based discussions around things as of now ... And yes, open sourcing is a business decision which is not driven by techies and depend on many things.
As to the chances of adoption, the properties of the protocol will speak for itself and with mild tongue in cheek let me quote B. Shaw: "The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." I made a career of being the unreasonable man ;-)
Look me up anytime for lunch in the Valley to exchange ideas ;-)
For example, on the BGP story: when we tried using OSPF, we hit multiple issues in different vendors' implementations. BGP just happened to be there and was implemented better, so we moved along with it, adapting the design. If, ISIS would have happened to be there across few vendors, things might have been very different.
I totally understand that open-sourcing is not something you can solely decide on. I liked the Contrail approach, however ;)
I fully understand the OTT/hyperscaler situation and acknowledged fully in the podcast that IMO BGP happened for OTTs in a good part because there was better open source around (and BGP is easier to hack without breaking stuff ;-). And OTTs can pay the OPEX right now, point. And link-states are harder to build & generally open source is so ho-hum only. However, they have nice properties and Open/R speaks for itself I guess ;-) And I don't hide the fact that some of the RIFT stuff was clearly inspired by the thinking/work you guys did there, it was a great thing someone who wasn't steeped in the 30 years of OSPF/ISIS dogma took a really fresh, hard look ;-)
And seriously, I understand the open-source angle but here's a bunch observations from long, long, long experience where I helped tons open source things & hacked on some & saw very few succeed (Linux was notable, I happened to hide guys in IBM research and funnel money their way while they were skunking towards 0.9 version which was quite a no/no then ;-), Python, Django being another). For each of those you have 10'000s of abandoned low-quality construction pits and uncontrolled catfights like KVM vs. GNOME.
a) So after years and years of "open source" we still don't have a really good link-state open source as you say. Reasons are manifold, one of them that scaling up link-state is really hard work people are not willing to do without making a living (and even if you try to explain to them how things need to be done to scale you end up being dismissed as irrelevant often, I have been down that curve, judged it a waste of time largely to be honest). And then it needs hard testing (testing costs tons of money as you know and no'one in open source is much excited about that one).
c) Open source tends to attract attention and then too much of it only after something is "hot". RIFT is not "hot" so I doubt I'll get tons of clueful, listening people willing to do the very hard work I would ask them to do (but I will of course get a lot of "opinions" ;-), lots of that work counter-intuitive but in case JNPR RIFT code goes open source we'll see how it plays out. Largely out my hand as I say being a techie and all and I'm kind of agnostic really.
d) most important: leaf version is trivial and it was done very intentionally that way. RIFT allows to build a server/ToR version which is relatively simple (no southbound @ all, no disaggregation, only default route really & so on). To the point it can be probably done in python ;-) There's tons of open source opportunity there which is however a different thing than "free lunch" ;-) BTW, I actually seen someone skunking something in one of the new half-scripting languages already but that petered out (as so often with open source, people get real jobs, loose interest, die on first hard hill & then you're left with an open construction pit on github ;-)
e) even most important: if you play with the package we released you can trivially hook-up your implementation to it and interop, that's one of the main points of supporting that effort.
So, in short, yeah, what we have may go open source, it may not, I doubt this is as important as you seem to think it is, someone with time & dedication can get stuff going quickly by himself and we provide public package with a framework where interop testing of the control part against our stuff is trivial. If you run the package I released you see that you can converge a 30 or so switch CLOs with couple hundred prefixes in a second or so and look @ all the state.
Like I said, more of that best served with whiteboard and lunch ...
For example, take Arista, which has gained very strong footprint in DC. If you happened to have a quality OSS version, porting it to EOS would be trivial, and for Arista support cost would be just fixing possible SDK issues (though it's pretty stable now, albeit 32-bit). If some curious individual would write their RIFT implementation in scripting language (I did write bgp in python some time ago to bootstrap some things), the quality of said code is not likely be good enough for anyone to continue playing with.
Additionally, let's imagine the RFC has been finalized (indeed, I find it well written, and not causing dizziness, unlike OSPF's). The incentive for someone else to build their own high-quality implementation would be still low, unless the new protocol offers amazing wins, that surpass by far any other routing protocol in existence (which, as we know, is unlikely possible).
And yes, let's meet up if you want to discuss routing and things :)
I do however think that RIFT implementations will materialize due to the OPEX pressure on building fabrics/DCs I see everywhere except top 7 OTT frankly, otherwise I wouldn't have spent the time on spec'ing it. If my assumption that "CLOS fabric is the new RAM chip for bandwidth" is wrong adoption will be lower. If DC fabrics move to something massively different than CLOs then new game will be afoot, the math plays for me and money burnt on hypercubes/torroidals that never delivered ;-) but massive amounts of money trying to find something better try to think whether a whole DC cannot be made a NUMA or even pure optical switching, I know ;-) "Plus ca change, plus ca reste meme" however ;-)
As to IETF/open-source, importance of it and so on, the 2nd most successful ever protocol to be deployed I think was EIGRP so that's a funny angle to think through.
Pinged you on LinkedIn, if you willing let's move that to some nice food & beer venue and exchange couple angles I don't want to spell out necessarily on this blog ;-)
Only took Cisco 7 or 8 years to get that to work in the real world. The DUAL algorithm is real pretty on paper, but the reality was lot's of "Stuck in Active". Stub and other things I don't recall finally tamped down the flooding issues, but it is still not a very scalable protocol for number of peers you can have. Other than flexible route summarization it isn't much better than OSPF.