Public Cloud Behind-the-Scenes Magic

Thursday, March 18, 2021 07:27 UTC

Public Cloud Behind-the-Scenes Magic

One of my subscribers sent me this question after watching the networking part of Introduction to Cloud Computing webinar:

Does anyone know what secret networking magic the Cloud providers are doing deep in their fabrics which are not exposed to consumers of their services?

TL&DR: Of course not… and I’m guessing it would be pretty expensive if I knew and told you.

However, one can always guess based on what can be observed (see also: AWS networking 101, Azure networking 101).

They must be using overlay virtual networking to implement virtual networks. Nothing else would scale to what they need – scalability numbers achieved by products like Cisco ACI are laughable from a hyperscaler perspective.
It must be either complex enough or large enough not to be implementable on ToR switches.
AWS is the only one of the big three to offer bare-metal servers, and we know their magic runs in their smart NICs (as Pensando so proudly points out like it would validate their business model). Azure seems to be using FPGAs, and Google relied on a software solution.

For more details see:

Network load balancing and Internet-facing NAT are truly interesting. Microsoft wrote a paper describing an early implementation of their Network Load Balancer, and it’s reasonably easy to envision how the same approach could be used for NAT. I’m positive AWS is doing something similar.

5 comments:

Minh Ha 19 March 2021 07:47

Ivan, in addition to the above, there are 2 papers from Google detailing some of their network's design principle s and practices:

https://cseweb.ucsd.edu/~vahdat/papers/b4-sigcomm13.pdf

https://people.eecs.berkeley.edu/~sylvia/cs268-2019/papers/ramesh16a.pdf

In the first one, on page 4, they briefly mentioned their own B4 switch, which has Clos internal architecture similar to FB's Six-Pack. Overall, looks like Google makes heavy use of BGP, IS-IS and MPLS to scale their infrastructure.

Also, correct me if I'm wrong, but surely MPLS is a viable technology to build L3 virtual network if one doesn't want to resort to Overlay, no? Overlay is complex and therefore slow. MPLS is simpler and faster. The downside with MPLS is the more VRFs you have, the more CAM/TCAM resources are required and this can prove prohibitive given how scarce they are even in modern ASICs.

Ivan Pepelnjak 19 March 2021 08:10

Ah, the B4 paper... aka "look, we're so cool, we decided to become a router manufacturer". See https://blog.ipspace.net/2012/05/openflow-google-brilliant-but-not.html

As for MPLS as transport topology instead of an overlay, see https://blog.ipspace.net/2020/05/need-vxlan-transport.html

Kind regards, Ivan

Brad Hedlund 21 March 2021 01:23

Hi Ivan, After 6 years of working at AWS I don’t really know how it works either. For the basic principles of VPC under the hood your subscribers might like this video. It’s a bit old, but still pretty relevant.

https://m.youtube.com/watch?v=Zd5hsL-JNY4

Minh Ha 22 March 2021 04:53

I've also found this paper that describes in detail how MS has implemented their virtual networking platform over the years, and why they've chosen to go with overlay/directory service model:

https://www.usenix.org/system/files/conference/nsdi17/nsdi17-firestone.pdf

Looks like the implementations of Azure and GCP's virtual networking (detailed in the Andromeda paper) overlap a fair bit. One thing is certain: Openflow, in its classic form, is unworkable/unscalable. The VFP paper hints at that as the reason why NSX scales poorly (1000 hosts). Both Azure and GCP had to make heavy modifications to OF model in order to scale their infrastructure.

The overlay implementation obviously trades performance for scalability: section 5 of the VFP and sections 3, 4 of the Andromeda paper, give a glimpse into how slow their data planes can get as they give detailed architecture of the platforms. That's why MS decided, in the end, to go with hardware offloading, using FPGA SmartNiC -- essentially a specialized switch/router attached to a server -- to implement virtual networking, for better scalability.

The directory service model is also a concept prevalent across AWS, Azure, and GCP, albeit under different names. in AWS, it's called the Mapping service, in Azure, Directory, and Hoverboard in GCP. They all use this service to scale their routing table to millions of entries on the cheap, again at the cost of performance hit, because it's done in software and requires communication to dedicated devices. Flow caching is used to improve performance, which is reminiscent of MLS back in the 90s.

Overall, since the philosophy behind their VNET is very much similar, whoever has the sanest physical design will have the best performance relative to the others. AWS by far seems to be on top as their physical architecture looks the sanest to me.

marten cassel 23 January 2022 09:23

https: //www.youtube.com/watch?v=8Kyoj3bKepY

Recent posts in the same categories

cloud

5 comments: