The lure of security groups is obvious: if you’re willing to change your network security paradigm, you can stop thinking in subnets and focus on specifying who can exchange what traffic (usually specified as TCP/UDP port#) with whom.
Getting rid of subnets? How?
If you’re not familiar with how security groups typically get implemented, you might wonder why I wrote that you can stop thinking in subnets. Here’s the short version of the story.
Security groups are like object groups on Cisco ASA:
- You specify the VM-to-group membership in the cloud orchestration system;
- Cloud orchestration system knows which IP address is assigned to which VM and is able to translate the group membership into a set of IP addresses belonging to that group;
- When you specify group-to-group rules (for example: Web group can communicate with the DB group on MySQL TCP port), the cloud orchestration system (or the network controller) generates an equivalent ACL and installs it in the virtual switch (or iptables).
If you’re considering scalability as part of your network design process, you probably immediately spotted the challenges of this approach:
- ACL is a Cartesian product of two sets (similar to the OpenFlow 1.0 state explosion) – the length of the ACL is proportional to the product of the group sizes;
- Most ACL implementations scan the entries sequentially (because networking engineers love to optimize irrelevant stuff and use overlapping ACL entries that make ACLs order-sensitive). ACL performance is thus inversely proportional to the product of group sizes (O(n^2) for those of you that love talking about computational complexity);
- ACLs have to be updated on all participating virtual switches every time a VM is added to any of the groups used in the ACL.
Oh, and if you want to implement security groups on ToR switches, you’ll quickly realize just how little TCAM they have – you might be better off inserting x86 servers into the forwarding path and using something like Snabb Switch on them.
Can we make it any better?
Sure we can. Instead of blindly converting per-group security rules into IP address ACLs we need a better matching mechanism that would work along these lines:
- Identify the group membership of the sending VM (trivial on ingress ACL, requires IP lookup on egress ACL);
- Identify the relevant ACL based on the group membership;
- Identify the group membership of destination IP address (trivial on egress ACL, requires IP lookup on ingress ACL);
- Perform ACL matches based on group membership information derived in the previous steps.
This algorithm replaces a single O(n^2) lookup with multiple simple lookups – group membership is a fixed-time lookup if your implementation uses MAC-to-group hash tables, and the time to match an ACL remains proportional to the ACL size, not to the product of group sizes.
Is anyone doing it?
Nuage VSP is using this approach with Open vSwitch in the hypervisors (they convert group membership into metadata passed between OpenFlow matching tables), and it seems Cisco is doing something very similar with Endpoint Groups (EPG) in its ACI architecture.
To get more details on the Nuage VSP approach (including the very smart use of BGP communities that allows you to extend security groups beyond the cloud infrastructure), register for the free Scaling Overlay Virtual Networks webinar on November 20th. You’ll have to wait a bit longer for my take on Cisco ACI; I won’t be able to create a webinar describing it before spring 2015.