Mice, Elephants and Virtual Switches
The Mice and Elephants is a traditional QoS fable – latency-sensitive real time traffic (or request-response protocol like HTTP) stuck in the same queue behind megabytes of file transfer (or backup or iSCSI) traffic.
The solution is also well known – color the elephants pink (aka DSCP marking) and sort them into a different queue – until the reality intervenes.
It seems oh-so-impossible to figure out which applications might generate elephant flows and mark them accordingly on the originating server; there’s no other way to explain the need for traffic classification and marking on the ingress switch, and other MacGyver contraptions the networking team uses to make sure “it’s not the network’s fault” instead of saying “we’re a utility – you’re getting exactly what you’ve asked for.”
Matching TCP and UDP port numbers on the server (because FTP sessions tend to be more elephantine than DNS requests) and setting DSCP values of outbound packets is also obviously a mission-impossible for some people; it’s way easier to pretend the problem doesn’t exist and blame the network for lack of proper traffic classification.
One has to wonder how well the recent surge of application-aware networking solution will fare if the server/application teams cannot be bothered to tell the network what type of traffic it’s facing by setting a simple one-byte value in each packet, but let’s not go there.
Anyway, situation gets worse in environments with truly unclassifiable traffic (as the ultimate abomination imagine a solution doing backups over HTTP) where it’s impossible to separate elephant from mice based on their TCP/UDP port numbers.
If, however, one would have insight into the operating system TCP buffers, or measure per-flow rate, one might be able to figure out which flows exhibit overweight tendencies – and that’s exactly what the Open vSwitch (OVS) team did.
Additionally, OVS appears as a TCP-offload-capable NIC to the virtual machines, and the bulk applications happily dump megabyte-sized TCP segments straight into the output queue of the VM NIC, where it’s easy for the underlying hypervisor software (OVS) to spot them and mark them with a different DSCP value (this idea is marked as pending in Martin Casado’s presentation).
The results (documented in a presentation) shouldn’t be surprising – we know ping isn’t affected by an ongoing FTP transfer if they happen to be in different queues since the days Fred Baker proudly presented the first measurement results of the then-revolutionary Weighted Fair Queuing mechanism (this is the only presentation I could find, but WFQ already existed in late 1995) at some mid- ‘90s incarnation of Cisco Live (probably even before the days Cisco Live was called Networkers).
The OVS-based elephant identification is a cool idea, although one has to wonder how well it works in practice if it measures the flow rate of each and every flow passing through a virtual switch (see also OVS scaling woes).
Telling people how awesome it is that Cumulus-powered switches react to elephant flows in hardware is pure marketing – every switch works well when faced with properly marked packets. Calling DSCP marking “overlay-to-underlay integration” is also hogwash (no, I will not link to the source); we’ve been using DSCP marking for decades with no need for fancy names.
http://blog.sflow.com/2014/03/performance-optimizing-hybrid-openflow.html
Nice write up Ivan.