iSCSI with PFC? « ipSpace.net blog

Monday, July 8, 2013 07:31 +0200… updated on Thursday, November 19, 2020 12:17 UTC

iSCSI with PFC?

Nicolas Vermandé sent me a really interesting question: “I've been looking for answers to a simple question that even different people at Cisco don't seem to agree on: Is it a good idea to class IP traffic (iSCSI or NFS over TCP) in pause no-drop class? What is the impact of having both pauses and TCP sliding windows at the same time?”

2020-11-19: Seems like there’s no good answer to this question; see below

Let’s rephrase the question using the terminology Fred Baker used in his Bufferbloat Masterclass: does it make sense to use lossless transport for elephant flows or is it better to drop packets and let TCP cope with packet loss?

It’s definitely not bad to randomly drop an occasional TCP packet of a mouse session – if you have thousands of TCP sessions on the same link and drop a single packet of one or two sessions to slow them down, the overall throughput won’t be affected too much ... and if you randomly hit different sessions at different times, you’re pretty close to effective management of a mice aggregate.

Elephants are different because they are rare and important (see also Storage Networking is Different and Does Dedicated iSCSI Infrastructure Make Sense?) – dropping a single packet of an elephant iSCSI session could affect thousands of end-user sessions (because the overall disk throughput would go down), more so if you’re using iSCSI to access VMware VMFS volumes (where a single iSCSI session carries the data of all VMs running on the vSphere host).

Classifying iSCSI as lossless traffic class thus seems to make a lot of sense (but see below), and comparing the results of QoS policing (= dropping) versus shaping (= delaying) on a small number of TCP sessions supports the same conclusions.

Going back to Fred Baker’s Bufferbloat presentation: he claims delay-based TCP congestion control (that you get with PFC) is the most stable approach (assuming the host TCP stack has a reasonable implementation that responds to delays).

After Almost a Decade

I never found an authoritative answer to this question, and if you ask a dozen experts you’ll get two dozen answers, in particular if you add the does that mean we need large buffers in data center switches question to the mix. In any case, you might find these resources helpful:

J Metz published fantastic napkin dialogues diving deep into iSCSI details;
I discussed the “should we delay or drop” question with Thomas Graf and Juho Snellman, and the conclusions were that drops are not bad assuming you have a decent TCP stack implementation;
Google introduced a totally new TCP congestion control (BBR) in 2016
JR Rivers addressed the question of data center buffering in an excellent (and free) Networks, Buffers and Drops webinar.

More information

Storage networks, iSCSI, FCoE and Data Center Bridging (including PFC) are described in the Data Center 3.0 for Networking Engineers webinar.

5 comments:

Dave Taht 09 July 2013 08:54

What you write here conflates a few things together that perhaps shouldn't.

1) if your network is dedicated to iSCSI, then by all means use delay based congestion mechanisms and lossless notifications like ECN.

2) if it isn't, you are in a world of hurt with competing with other forms of TCP.

3) I'd be very interested in a similar plot of an iSCSI network with fq_codel enabled, as well as a measurement of latency on your disk storage nodes under these loads with both the the lossless and fq_codel based approach. While throughput is important, high latencies cause other problems.

Ivan Pepelnjak 09 July 2013 09:23

#1 - Agreed

#2 - PFC + ETS giving iSCSI traffic guaranteed part of the bandwidth (ex: 3 Gbps on a 10 Gbps link) is pretty much equivalent to #1.

#3 - Me too ;)

Tom Lijnse 10 July 2013 22:46

I am curious whether there really is a simple answer to this seemingly simple question.

I have had some discussions at Cisco Live London around this topic after attending the "Mastering Data Center QoS" session (https://www.ciscolive365.com/connect/sessionDetail.ww?SESSION_ID=6028).

Another factor that came into play was whether the switched network was just a single hop or multi-hop between the servers and the storage.

Enabling PFC on a multi-hop network could introduce head-of-line blocking on the inter-switch links. To stick with your analogy: If a single elephant is slowed down, then other elephants might be prevented from crossing the bridge between the switches.

In the end the recommendation that I got was not to enable PFC on multi-hop iSCSI networks, because the harm done by head-of-line-blocking could outweigh the benefit of using PFC. One of the participants in the discussion actually claimed he had seen a significant performance improvement after disabling flow-control on their iSCSI network. Unfortunately I haven't seen hard evidence of this theory.

I am interested to hear your opinion on this. Could the answer to this question be different for single-hop vs multi-hop DCB-enabled networks?

Ivan Pepelnjak 11 July 2013 06:47

Yeah, HoL blocking is the biggest elephant in the room, and the risk of encountering it definitely increases with the network size and port speed mismatch.

So far, all I've heard are theories. Hard facts would be nice, but I haven't found them yet.

Lennie 23 July 2013 13:04

The performance when packets drop very much depend on the TCP/IP stacks used that are involved.

Tweaks in recovery algorithms like proportional rate reduction for TCP (rfc6937) good have a big impact.

Also the test with 4 equal flows might not be as realistic as you might want them to be.

After Almost a Decade

More information

Recent posts in the same categories

data center

SAN

workshop

5 comments: