Do We Need QoS in the Data Center?
Whenever I get asked about QoS in the data center, my stock reply is “bandwidth is cheaper than QoS-induced complexity.” This is definitely true in most cases, and ideally the elephant problems should be solved higher up in the application stack, not with network-layer kludges, but are there situations where you actually need data center QoS?
Congestion detection and TCP ECN marking might be a good use case and can be done with minimal interface configuration – all it takes is a few configuration lines on Arista EOS and Cisco Nexus OS. Data Center TCP uses ECN markings to detect congestion and reduce transmission rate before packets get dropped (packets drops could result in not-so-insignificant performance degradation because they kick the NICs out of TCP offload mode).
There might be cases where you need QoS to reduce latency, but I don’t think VoIP qualifies. At 10Gbps speeds, you need 1 MB of packets sitting in the output queue to generate an additional millisecond of latency.
Finally, if you’re forced to implement queuing to reduce the impact of elephant flows (for example), insist on behaving like a service provider and keeping the network configuration as clean as possible – police ingress traffic (if needed) and queue packets based on DSCP or 802.1p markings. Application-aware processing (hopefully resulting in DSCP marking) belongs to hypervisors or the end-hosts, not to the ToR switch.
Anything else? Share your thoughts in the comments.
http://blog.ipspace.net/2010/09/introduction-to-8021qaz-enhanced.html
http://blog.ipspace.net/2010/10/pfcets-and-storage-traffic-real-story.html
http://blog.ipspace.net/2013/07/iscsi-with-pfc.html
http://blog.ipspace.net/2010/11/does-fcoe-need-qcn-8021qau.html
Also, as it was mentioned here - microburst can added some small issues time to time.
"if we are talking about DC with VDI" << hope you'll have 10GE links to the servers and 40GE or 100GE uplinks. How much traffic does a VDI session generate?