Anyone Using Intel Omni-Path?

One of my subscribers sent me this question after watching the latest batch of Data Center Fabrics videos:

You haven’t mentioned Intel's Omni-Path at all. Should I be surprised?

While Omni-Path looks like a cool technology (at least at the whitepaper level), nobody ever mentioned it (or Intel) in any data center switching discussion I was involved in.

Intel’s solution never came up in my consulting engagements, and it’s not even mentioned in the 2018 Gartner magic quadrant (which means it doesn’t exist in their customer base).

Also, I keep wondering why nobody is using Intel silicon. Arista did something years ago with FM6000, but that was the only time I’ve ever seen Intel ASIC used in a data center switch.

The only time I heard a similar idea was years ago when Intel was talking about switching silicon in NICs (HT: Jon Hudson during an Interop party). At that time, the architecture they promoted was a hypercube built from servers with switching NICs.

While that idea might make sense for very particular workloads (= Finite Elements Method) it’s basically NUMA writ large… and it looks like Intel abandoned that idea in favor of a more traditional approach.

It seems Omni-Path is heavily used in High-Performance Computing (HPC) environments as an Infiniband replacement. No surprise there, Intel always had very-low-latency chipset (that was the reason Arista used FM6000), and combined with all other features they claim they implemented in their Fabric Manager (think proprietary SDN controller that actually works) that would make perfect sense.

However, it looks like even High-Frequency Trading (HFT) doesn’t need that kind of speed. Arista was traditionally very strong in HFT, but after launching one product each Cisco and Arista effectively stopped competing on very-low-latency switches… or maybe the mainstream merchant silicon became fast enough.

Are you seeing something different? Is anyone using Omni-Path outside of HPC world?

5 comments:

  1. Intel pivoted Uri's Fulcrum work (Mellanox's Ethernet switch is lower latency btw) to Red Rock Canyon, which would have been an awesome tray-level (4 NICs plus a modest radix switch) torus/hypercube type design (no need for leaf and spine switches in data center) if just the physical layer had ended up cheap enough for the torus solution to be cheaper than leaf/spine fat tree at the row level.

    Red Rock Canyon was, best I can tell, repositioned as a value add NIC (after all what NIC has a full switch packet processing pipeline available).

    And do not forget that when optics directly from the big CMOS ASICs (manufacturable at scale) happens in the 2020s, the economics of those torus-etc designs will be better than leaf/spine for configurations like hyperscale (bought and installed rows at a time and not upgraded or reconfigured).
    Replies
    1. For people who are interested in learning about non-Clos topologies I recommend this presentation: https://www.microsoft.com/en-us/research/video/network-topologies-for-large-scale-datacenters-its-the-diameter-stupid/ Obviously it's somewhat biased since he's promoting his own SlimFly topology but generally I agree with his conclusions.

      I looked into a Red Rock Canyon torus design in 2015-2016 and declined to implement it due to cost of optics and poor scalability of bisection bandwidth. Ultimately I don't think a switch-per-server design will ever make sense, especially considering the increasing throughput and radius of merchant switch ASICs. Rack-level direct switching topologies (like Jellyfish and SlimFly above) make more sense but there's a question of whether the routing complexity is worth the cost savings.

      I don't think the Silicon Photonics Rapture is coming for the same reason that flash won't replace hard disks: if SiPh becomes briefly cheaper than VCSELs the surge in demand would push the price back up to parity. At best SiPh will reach parity and then both technologies will reduce costs over time, maybe with the mix slowly shifting from one to another.
    2. Wes, nailed it ;-)
  2. FM6000 probably could have beaten Broadcom Trident if it was released on time. For some reason, possibly related to the acquisition, FM6000 was massively delayed so all the switch vendors used Trident instead. That 2010-2011 time period was a real inflection point for merchant silicon and Intel missed it. Then the FM10000 (Red Rock Canyon) was targeted at a niche market which turned out to not exist and that's all she wrote.

    Intel was a leader in 10G NICs and they threw that away by not releasing mainstream 25/50/100G NICs; this may be tied up in their 10 nm problem. Now Mellanox has 75% NIC market share and there are rumors about Intel buying them.

    Omni-Path appears to be derived from Cray and QLogic Infiniband technology and probably has nothing to do with Fulcrum. I would not recommend using Infiniband/Omni-Path outside of HPC now that Ethernet is the same speed.
  3. It's fascinating to see how networking folks are being driven by requirements (at least on IP fabrics) that start to resemble NUMA more and more (i.e. uniform low delay, zero loss) while in reality non-uniform addressing, scale, cabling & connector length/density/cost is preventing them from re-using the usual NUMA architectures (hypercubes, meshes, draconoid flavors ;-) The promise of NUMA never extended beyond few racks AFAIK with all the Flexi, extended PCI and so on ... Infiniband was driven hard (and made work over long distances long time ago) but for some reason never really took of en masse, I suspect cost vs. dirt-cheap-packet-tech a.k.a. as Ethernet ... Same battle as messaging vs. distributed memory distributed architecture approaches where the outcome was pretty counter-intuitive ;-)
Add comment
Sidebar