Build the Next-Generation Data Center
6 week online course starting in spring 2017

iSCSI with PFC?

Nicolas Vermandé sent me a really interesting question: “I've been looking for answers to a simple question that even different people at Cisco don't seem to agree on: Is it a good idea to class IP traffic (iSCSI or NFS over TCP) in pause no-drop class? What is the impact of having both pauses and TCP sliding windows at the same time?

Let’s rephrase the question using the terminology Fred Baker used in his Bufferbloat Masterclass: does it make sense to use lossless transport for elephant flows or is it better to drop packets and let TCP cope with packet loss?

It’s definitely not bad to randomly drop an occasional TCP packet of a mouse session – if you have thousands of TCP sessions on the same link and drop a single packet of one or two sessions to slow them down, the overall throughput won’t be affected too much ... and if you randomly hit different sessions at different times, you’re pretty close to effective management of a mice aggregate.

Elephants are different because they are rare and important (see also Storage Networking is Different and Does Dedicated iSCSI Infrastructure Make Sense?) – dropping a single packet of an elephant iSCSI session could affect thousands of end-user sessions (because the overall disk throughput would go down), more so if you’re using iSCSI to access VMware VMFS volumes (where a single iSCSI session carries the data of all VMs running on the vSphere host). Classifying iSCSI as lossless traffic class thus makes a lot of sense.

Going back to Fred Baker’s Bufferbloat presentation: he claims delay-based TCP congestion control (that you get with PFC) is the most stable approach (assuming the host TCP stack has a reasonable implementation that responds to delays).

Comparing the results of QoS policing (= dropping) versus shaping (= delaying) on a small number of TCP sessions supports the same conclusions. Here are the graphs Jeremy Stretch made for the Policing Versus Shaping article:

Throughput of four TCP sessions (+aggregate) with policing (packet drops)

Throughput of four TCP sessions (+aggregate) with shaping (packet delays)

More information

Storage networks, iSCSI, FCoE and Data Center Bridging (including PFC) are described in the Data Center 3.0 for Networking Engineers webinar.


  1. What you write here conflates a few things together that perhaps shouldn't.

    1) if your network is dedicated to iSCSI, then by all means use delay based congestion mechanisms and lossless notifications like ECN.

    2) if it isn't, you are in a world of hurt with competing with other forms of TCP.

    3) I'd be very interested in a similar plot of an iSCSI network with fq_codel enabled, as well as a measurement of latency on your disk storage nodes under these loads with both the the lossless and fq_codel based approach. While throughput is important, high latencies cause other problems.

    1. #1 - Agreed

      #2 - PFC + ETS giving iSCSI traffic guaranteed part of the bandwidth (ex: 3 Gbps on a 10 Gbps link) is pretty much equivalent to #1.

      #3 - Me too ;)

  2. I am curious whether there really is a simple answer to this seemingly simple question.

    I have had some discussions at Cisco Live London around this topic after attending the "Mastering Data Center QoS" session (

    Another factor that came into play was whether the switched network was just a single hop or multi-hop between the servers and the storage.

    Enabling PFC on a multi-hop network could introduce head-of-line blocking on the inter-switch links. To stick with your analogy: If a single elephant is slowed down, then other elephants might be prevented from crossing the bridge between the switches.

    In the end the recommendation that I got was not to enable PFC on multi-hop iSCSI networks, because the harm done by head-of-line-blocking could outweigh the benefit of using PFC. One of the participants in the discussion actually claimed he had seen a significant performance improvement after disabling flow-control on their iSCSI network. Unfortunately I haven't seen hard evidence of this theory.

    I am interested to hear your opinion on this. Could the answer to this question be different for single-hop vs multi-hop DCB-enabled networks?

    1. Yeah, HoL blocking is the biggest elephant in the room, and the risk of encountering it definitely increases with the network size and port speed mismatch.

      So far, all I've heard are theories. Hard facts would be nice, but I haven't found them yet.

  3. The performance when packets drop very much depend on the TCP/IP stacks used that are involved.

    Tweaks in recovery algorithms like proportional rate reduction for TCP (rfc6937) good have a big impact.

    Also the test with 4 equal flows might not be as realistic as you might want them to be.


You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.