Introduction to 802.1Qbb (Priority-based Flow Control — PFC)
Yesterday I wrote that you don’t need DCB technologies to implement FCoE in your network. The FC-BB-5 standard is quite explicit (it also says that 802.1Qbb is the other option):
Lossless Ethernet may be implemented through the use of some Ethernet extensions. A possible Ethernet extension to implement Lossless Ethernet is the PAUSE mechanism defined in IEEE 802.3-2008.
The PAUSE mechanism (802.3x) gives you lossless behavior, but results in undesired side effects when you run LAN and SAN traffic across a converged Ethernet infrastructure.
Traffic blocking with the PAUSE mechanism
The PAUSE mechanism is part of the Ethernet (802.3) standard and allows a receiver on a point-to-point Ethernet link to stop the adjacent sender thereby preventing a buffer overflow and packet loss.
Imagine a simple FCoE network with a server, a storage array and a switch, with server sending large amounts of data to the storage array.
When the server overloads the storage array with the data it’s sending, the storage array sends a PAUSE frame back to the switch.
The switch stops sending data to the storage array after receiving the PAUSE frame and data sent by the server start to accumulate in switch’s internal buffers until the switch has to tell the server to pause.
At that moment, the server’s Ethernet interface is effectively blocked, which is not a problem if you have a dedicated FCoE infrastructure. The same result is unacceptable in a converged infrastructure, where FCoE and LAN traffic share the same links.
Traffic blocking with Priority Flow Control (802.1Qbb)
802.1Qbb is a simple extension of the 802.3x mechanism: the PAUSE frame contains a 8-bit bit mask of 802.1p priorities (specifying which traffic classes should be paused) and a timer for each priority specifying how long the traffic in that priority class should be paused. The per-priority PAUSE mechanism allows the storage array to tell the switch it should stop sending just the FCoE traffic (assuming FCoE traffic is marked with priority value=3).
Likewise, the switch can tell the server to stop sending FCoE traffic and the LAN traffic is not impacted.
It’s also possible (at least in theory) to combine 802.3x and 802.1Qbb. For example, the storage array could use the 802.3x PAUSE mechanism to slow down the switch, whereas the switch (after noticing its priority-3 queues are filling up) could use 802.1Qbb PAUSE frame to tell the server to stop sending FCoE traffic.
More details
The PFC mechanism can quickly result in head-of-line blocking and extended congestion and is thus applicable only to small bridged domain. It should be combined with congestion notification/avoidance mechanisms (for example, 802.1Qau) in larger domains.
PFC was designed for use on point-to-point links (it does not work in PON environments) and cannot be used together with 802.3x on the same link (two competing PAUSE mechanisms on the same link make no sense). It needs DCBX standard to negotiate the parameters between adjacent nodes, including the number of traffic classes that can support PFC and the priorities for which PFC should be enabled. A standard-compliant implementation of 802.1Qbb thus requires support for DCBX as well.
The timings are quite strict (sender should stop sending in ~ 600 nanoseconds), making a hardware implementation the only viable option.
Pre-standard implementations (speculative)
As the 802.1Qbb addendum hasn’t been ratified yet, all current PFC implementations are by definition pre-standard. However, the format of the PAUSE message hasn’t changed from the very early drafts, indicating that the existing hardware implementations will probably need just a software upgrade to support potential late changes to the DCBX protocol.
Need more?
You’ll get an overview of DCB, FCoE and numerous other Data Center technologies in my Data Center 3.0 for Networking Engineers webinar (buy a recording or yearly subscription).
The 802.1bb page has links to numerous presentations.
Priority Flow Control: Build Reliable Layer 2 Infrastructure white paper from Cisco has great in-depth description of 802.1Qbb and planning recommendations for Nexus 5000 and Cisco’s CNA.
However, it does require hardware support and will thus probably never be available on older switches (or will require new linecards on modular boxes).
At the moment, PFC is considered to be a Data Center functionality (although, as said above, could be useful in other scenarios as well), so don't expect to see it elsewhere any time soon.
It would be great, to see something like PFC for vlans or l2/l3 addresses.
One more Q - if the same server in question is using storage that is not directly connected does the frame properly traverse multiple switches back to the sender ok? I see lots of documentation mentioning the medium type/lenght limitations, but not much in the case of multiple switches.
Lastly Thanks for the article! (i know I'm reading this a few years after the fact, but glad I found it).
PFC is a hop-by-hop mechanism and thus works equally well for directly-connected nodes as for multi-switch environments (note: some people say you might get into unpleasant HoL blocking scenarios without QCN in very large environments).
I believe I'm at a satisfactory place with PFC. I've been trying to track down everything out there on the topic between the time I posted the comment and now.
Found a blog reference stating that multiple storage vendors recommend using 802.3x mode desired on the switch side (Flow control RX only). Really couldn't find much even from the storage vendors themselves on the topic. But it sounds reasonable enough that I should only need to be able to process inbound PAUSE frames.
I'm investigating QCN now as my logical next step. Not sure if a leaf spine layout where the switch 'hops' is an architecture that would benefit from QCN, but it almost sees impossible that it would even be implemented where there are more switch hops. I'm in a place where even my vendor documentation doesn't seem reliable regarding QCN (One white paper states support for QCN, another white paper for the same switch only mentions ECN). Hehe - It seems for every answer I find I come up with few more questions. Especially around converged iSCSI implementations. Lots on FCoE but not many are willing to give authoritative recommendations on converged iSCSI.
I guess I really have to bite the bullet and start reading RFC3720 to my kids at bed time.
Sorry for the rant - thanks for the response.
-Gabe
If a feature is not described in the product configuration guide, it's not there ... and if the vendor doesn't publish product documentation online, run away as fast as you can.
BTW, RFC 3720 won't help you. It deals with the stuff above TCP and assumes the network gnomes do their magic.
Advise taken. Just to report back in.. my sort of my in-depth exploration of DCB and iSCSI pretty much ended with a senior colleague (who happened to be deeply involved with DCB in the lab during the FCoE push 2 years ago) was explaining to me that while PFC sounds hands down better than 802.3x FC, the big benefit comes from deploying it alongside ETC... and ETC along with PFC need to be negotiated end to end [Host <-> Switch <-> Storage] and well.. PFC might as well be no-existent from the storage perspective. In any case your articles have been beyond educational.
Thanks for all the great feedback. You can be sure I'm closely following your new content via RSS.
-Gabe.
"802.1Qbb is a simple extension of the 802.3x mechanism: the PAUSE frame contains a 8-bit bit mask of 802.1p priorities (specifying which traffic classes should be paused) and a timer for each priority specifying how long the traffic in that priority class should be paused. The per-priority PAUSE mechanism allows the storage array to tell the switch it should stop sending just the FCoE traffic (assuming FCoE traffic is marked with priority value=3)."
According to the Cisco and IEEE documents, there are only 8 possible CoS used within PFC so that would be 3 bits (802.1p)
Can you clarify IVan ? :)
Thanks
Nicolas
Think of it this way, all combinations of 3 bits specify the names (numbers) of the 8 classes. The 8 bits specify, in order, which of the 8 classes the pause frame applies to, and that includes specify any or all.
Cant find any info about that.
i have got the over all concept but when i tried to implement i got few doubts
1] what should PFC structure should contain ? Do we need 8 parameters(Uint) for each class ?
struct {
Uint priority;
Uint class1;
Uint class2;
.
.
Uint class8;
};
2) In that frame class priority is 2 octet and in this 2 octets first 2 bit for enabling class(on/off) and remaining for classes (0 -7) . If i want to set a priority for class 5 what will be my binary form? can you show me those full 2 octet in binary form.
I'm positive you'll find all the relevant data structures there.