Project Calico: Is It Any Good?

At least a dozen engineers sent me emails or tweets mentioning Project Calico in the last few weeks – obviously the project is getting some real traction, so it was high time to look at what it’s all about.

TL&DR: Project Calico is yet another virtual networking implementation that’s a perfect fit for a particular use case, but falters when encountering the morass of edge cases.

What is Project Calico? It’s a virtual networking implementation for Linux, targeting high-density virtualization and container (Docker) environments. It comes with Neutron plug-in (and a few other plugins), making it immediately usable in OpenStack deployments.

How does it work? The virtual network is modeled as a single microsegmented flat VLAN with shared IP address space. iptables (packet filters implemented in Linux hosts) are used to implement tenant separation.

However, the creators of Calico know that a single flat VLAN doesn’t scale, so they added an interesting twist almost identical to my IPv6 microsegmentation ideas:

  • Every Linux host is an IP router running BGP;
  • Virtual machines (or containers) are attached to a virtual router, not a bridge (similar to what Juniper Contrail or Microsoft Hyper-V are doing);
  • Host routes are used for end-to-end packet forwarding between customer endpoints and distributed between Linux hosts with BGP;
  • BGP route reflectors (also running on Linux hosts) are used to build a scalable routing control plane.

This architecture retains the properties of a flat layer-2 network (it’s easy to move customer IP addresses between hosts), but scales much better – the number of active endpoints on the layer-2 segment is equal to the number of physical hosts, not the number of virtual machines or containers.

The flooding across the physical layer-2 network is also reduced to a minimum. Unless you want to deploy IP multicast across customer workload, you won’t see much more than ARP requests between physical hosts in the transport VLAN (it’s highly unlikely that a hypervisor host remains silent for a long enough time to trigger unknown unicast flooding).

The Obvious Problems

Project Calico uses a single microsegmented network, resulting in the obvious drawbacks of microsegmentation:

  • Single forwarding domain, which makes service insertion way harder to implement than multiple forwarding domains. You could (in theory) use iptables on Linux hosts to implement service insertion with PBR, but I’m positive I don’t want to see the results or be involved in a troubleshooting session;
  • Provider-coordinated IP addresses. When multiple tenants share the same IP address space, someone has to coordinate the addresses, which means that you cannot simply migrate your existing workload into such a cloud (unless you’re running new-age applications that rely exclusively on DNS and mechanisms like service registration and discovery);
  • No overlapping IP addresses. The creators of Project Calico recognized the problem and proposed a “solution” using 464XLAT (emulating end-to-end IPv4 with a sequence of NAT46 and NAT64). I definitely wouldn’t want to be anywhere nearby a cloud that uses this approach (see also RFC 1925, section 2.3);

If you’re willing to accept these limitations, and offer cloud services where you force the tenants to use provider-supplied IP addresses (Amazon figured out years ago that they cannot do that), Project Calico might be a perfect fit for your private or public cloud… unless you want to build a scalable environment.

I’m also wondering about the performance of the solution, as (based on my experience with virtual appliances) Linux IP stack isn’t exactly the fastest IP forwarding mechanism in the universe, but maybe I’m wrong… or maybe few Gbps per host (which hopefully has 2 10Gbps uplinks) is still ludicrous speed.

The Big Problems

The router-in-hypervisor approach is definitely the right way to go (it’s used by reasonably-scalable networks like Amazon VPC), but the flat VLAN transport between hypervisor hosts kills the beauty of the concept for environments that need more than one or two switches – you’re forced to deploy fragile kludges like MLAG, proprietary layer-2 fabrics, or concoctions that would make MacGyver proud.

Do I have to emphasize yet again that every layer-2 domain represents a single failure domain and that it inevitably fails sooner or later?

Alternatively, you could use a real IP fabric and dump all host routes into physical switches using BGP between hypervisor hosts and physical switches… and kill scalability because you’d become dependent on the L3 table sizes in physical switches.

Finally, you might decide to do route summarization on the ToR switches (resulting in an architecture very similar to my microsegmentation ideas), which seems to be the next step that the Project Calico engineers are thinking about, but then you’d lose the ability to migrate VMs… or you might get really smart and advertise out-of-rack host routes and summarize the in-rack routes.

Can It Be Fixed?

Of course it might be possible to fix the Project Calico architecture to support true multi-tenancy and multiple routing domains:

  • Use overlay virtual networks to decouple hypervisor-based packet forwarding from the transport network;
  • Replace simple BGP with MPLS/VPN or EVPN.

If this description sounds a lot like an existing product, you’re absolutely right. Fixing Project Calico to address the not-so-very-insignificant corner cases would turn its architecture into something very close to Juniper Contrail.

Unfortunately, I don’t expect to see anything along these lines anytime soon. It’s relatively easy to take existing open-source components and add some glue; writing a proper multi-tenant control plane (using BGP or something else) from scratch is a totally different ballgame.

For another perspective on these same issues, read the blog post by Christopher Liljenstolpe, the original architect of Project Calico.

8 comments:

  1. I looked at Calico from an enterprise perspective i.e. no multi-tenancy. What I liked was turning the hypervisor into a router and exchanging host routes with the ToR. I also liked the direct connectivity between virtual machines and physical servers whether they are databases, or backup systems, etc. without a need for an overlay connecting to a gateway of some sort.

    I do think scalability could be limited with all the host routes being advertised into the core switching environment. As you stated, summarization at the ToR, could be an answer.

    You note that you have a concern with IP forwarding in Linux. Would this be similar to the performance of a vxlan gateway in kernel in a hypervisor such as ESX?

    All in all, I thought it was a clever solution to avoid an overlay network deployment. For my enterprise though, it is not going to work in the near future since we have standardized on VMware.
    Replies
    1. There is significant issue with single forwarding VLAN.Single L2 domain.When that one breaks,everything falls apart.Butterfly effect:)
  2. I also looked at it, but without a multi-tenant control/forwarding plane it's kind of a non-starter these days. I haven't kept up but 6WIND was going to contribute changes to Quagga to support VRFs through Linux network namespaces. On the Quagga side it could be represented by a different physical interface or VLAN or whatever.
  3. Thanks for picking up this topic.

    Is Callico basically host routing per tenant with some proxy ARP on the vRouter ? Would be great if a network picture could be added to this article to make it clearer.

    Does Amazon VPC do the same thing alongwith some type of MPLS VPN/ EVPN ?

    Replies
    1. As always, follow the links in the blog post for more details, including packet forwarding diagrams.
  4. Very interesting read. We are actually using Calico at our Bare Metal servers for our front-ended Elastic Cloud Routers we deploy. We are a pure L3 design where we do eBGP from the Calico/BIRD container on the BMS up to the LEAF, and then iBGP up to the SPINE. Route-Reflection and all the like is in place, so scalability is there. Our workloads are able to spin up fine etc.. The key for us though, is we aren't a multi-tenant environment for the edge use case. It's purely for ingest. I'm really looking forward to seeing where they take Calico over the years, and also if they focus solely on the container world, or lean more towards OpenStack integration and beef up their Neutron integrations.
  5. I'm confused on the single flat VLAN you are referring to. It's possible to build multiple subnets using Calico.
    Replies
    1. Focus on what you need as the transport network between the hypervisor hosts.
Add comment
Sidebar