Overlay Networks and QoS FUD
One of the usual complaints I hear whenever I mention overlay virtual networks is “with overlay networks we lose all application visibility and QoS functionality” ... that worked so phenomenally in the physical networks, right?
The wonderful QoS the physical hardware gives you
To put my ramblings into perspective, let’s start with what we do have today. Most hardware vendors give you basic DiffServ functionality: classification based on L2-4 information, DSCP or 802.1p (CoS) marking, policing and queuing. Shaping is rare. Traffic engineering is almost nonexistent (while some platforms support MPLS TE I haven’t seen many people brave enough to deploy it in their data center network).
Usually a single vendor delivers an inconsistent set of QoS features that vary from platform to platform (based on the ASIC or merchant silicon used) or even from linecard to linecard (don’t even mention Catalyst 6500). Sometimes you need different commands or command syntax to configure QoS on different platforms from the same hardware vendor.
I don’t blame the vendors. Doing QoS at gigabit speeds in a terabit fabric is hard. Really hard. Having thousands of hardware output queues per port or hardware-based shaping is expensive (why do you think we had to pay an arm and a leg for ATM adapters?).
Do we need QoS?
Maybe not. Maybe it’s cheaper to build a leaf-and-spine fabric with more bandwidth than your servers can consume. Learn from the global Internet - everyone talks about QoS, but the emperor is still naked.
How should QoS work?
The only realistic QoS technology that works at terabit speeds is DiffServ – packet classification is encoded in DSCP or CoS (802.1p bits). In an ideal world the applications (or host OS) set the DSCP bits based on their needs, and the network accepts (or rewrites) the DSCP settings and provides the differentiated queuing, shaping and dropping.
In reality, the classification is usually done on the ingress network device, because we prefer playing MacGyvers instead of telling our customers (= applications) “what you mark is what you get”.
Finally, there are the poor souls that do QoS classification and marking in the network core because someone bought them edge switches that are too stupid to do it.
How much QoS do we get in the virtual switches?
Now let’s focus on the QoS functionality of the new network edge: the virtual switches. As in the physical world, there’s a full range of offerings, from minimalistic to pretty comprehensive:
- vDS in vSphere 5.1 has minimal QoS support: per-pool 802.1p marking and queuing;
- Nexus 1000V has a full suite of classification, marking, policing and queuing tools. It also copies inner DSCP and CoS values into VXLAN+MAC envelope;
- VMware NSX (the currently shipping NVP 3.1 release) uses a typical service provider model: you can define minimal (affecting queuing) and maximal (triggering policing) bandwidth per VM, accept or overwrite DSCP settings, and copy DSCP bits from virtual traffic into the transport envelopes;
- vDS in vSphere 5.5 is has full 5-tuple classifier and CoS/DSCP marking (here's how it works).
- We’ll see what NSX for vSphere delivers when it ships ;)
In my opinion, you get pretty much par for the course with the features of Nexus 1000V, VMware NSX or (hopefully) vSphere 5.5 vDS, and you get DSCP-based classification of overlay traffic with VMware NSX and Nexus 1000V.
It is true that you won’t be able to do per-TCP-port classification and marking of overlay virtual traffic in your ToR switch any time soon (but I’m positive there are at least a few vendors working on it).
It’s also true that someone will have to configure classification and marking on the new network edge (in virtual switches) using a different toolset, but if that’s an insurmountable problem, you might want to start looking for a new job anyway.
The only issue with that solution (I used to work for the military where we had similar model of encrypted overlay) is tenants thinking that it is a security risk to have their marking showing to other people (I know, it's a stretch, but I have had that argument before)
Ok, so if I have a 1Gbp WAN link, no matter what, it's only going to be a 1Gbp WAN link. I don't care if you run compression, QoS, CoS, or even play with TDM frequencies. It's still only going to be a 1Gbp WAN link. Most of the time the only time QoS even comes into play is when there is congestion. Trying to send more over the pipe, at the same time, than its capacity. QoS basically only says, "If the pipe is congested, here's what we send first, second, third, etc." Generally the time sensitive and PIO (Packets In Order) data (like VoIP) are sent first, followed by whatever your company feels is important, and everything else is marked best effort. So if for 5 minutes your web browsing is slow and your Pandora stream stutters, oh well!
The problem here is that QoS starts out with good intentions. [Remember how the path to Hell is paved!] But in the end winds up causing more problems than it really solves. Over time data priority becomes political within a company and as soon as Marketing finds out that you've de-prioritized their web traffic and they might have to wait 15 mins to download the latest YouTube video of some competitors commercial, the IT Director find himself with the CIO sitting with the CEO to have a little "Chit-Chat". This happened because the Director of Marketing is the CEO's cousin and she just got done with the presentation on how the ability to watch competitors commercials in real-time and without delay saves the company $1.2 Million annually. Instantly you just became the bad guy.
Everybody wants their traffic to never be interrupted but somebody has to stop and wait if QoS is going to do your network any good at all. Bottom line!
Also I would suggest reading more on what QoS does. Either you super simplified it for the layman or you may not know. I'm going to assume you simplified it, if so, please disregard this part of the comment.
On similar lines, should we honor VM's TTL ?. If honored, applications with TTL=1 would break. If not, possibly of routing loops may arise due to synchronization issue between the controller and vswitch.
This on the road map for the Brocade VDX platform aswell as all kinds of fancy stuff like VXLAN to SPAN etc.
I am glad I am not the only one who believes that avoiding congestion by providing more bandwidth than the servers can consume can be the most cost-effective way of handling this.
Why do we need marking of overlay virtual traffic in TOR switch? maybe classification and load balancing on overlay virtual traffic in TOR switch is necessary. But I didn't see why we need marking inner packet? even if you want to remark, it should be on outer.
Centec Networks is designing a high density 10G chip of data center. It DOES support classification and load balancing on inner packet, but no marking inner packet.