FCoE over TRILL ... this time from Juniper

Wednesday, June 15, 2011 06:27 CEST

FCoE over TRILL ... this time from Juniper

A tweet from J Michel Metz has alerted me to a “Why TRILL won't work for data center network architecture” article by Anjan Venkatramani, Juniper’s VP of Product Management. Most of the long article could be condensed in two short sentences my readers are very familiar about: Bridging does not scale and TRILL does not solve the traffic trombone issues (hidden implication: QFabric will solve all your problems)... but the author couldn’t resist throwing “FCoE over TRILL” bone into the mix.

I thought that bone has been scrapped clean in the last few months, but it seems some people still think it’s worth chewing. Let’s try to make it really simple. There are two fundamental ways of implementing multi-hop FCoE:

Dense-mode FCoE, where every LAN switch is also Fiber Channel Forwarder (FCF; equivalent to an IP router). FCoE frames are routed hop-by-hop and FCoE routing does not interact with STP or TRILL (because FCoE is not bridged). It’s also easy to implement SAN A/SAN B separation (two independent non-overlapping paths from each server to each storage device) as you control hop-by-hop flow of the FC traffic.
Sparse-mode FCoE, where LAN switches bridge FCoE frames like any L2 traffic and optionally provide FIP snooping. Because FCoE is bridged in this design, sparse-mode FCoE needs stable L2 transport which TRILL can provide more reliably than existing STP-based networks. Since you can’t control hop-by-hop FCoE traffic flow (it’s bridged according to whatever TRILL or STP thinks is the best path), it’s way harder to implement the SAN separation.

Got it? How hard was it? Do I really need to spell it out every few months?

Note to the equipment buyers: every time a vendor mentions “FCoE over TRILL” ask yourself “are they trying to divert my attention?” Most probably they don’t have full-blown FCF code in their switches.

Obviously I had to check QFX3500 JunOS documentation after reading the afore-mentioned article. As I came to expect from Juniper, the documentation is precise and well-written. QFX3500 supports FC and FCoE, has up to 12 “universal” ports (two groups of 6 ports) that can be configured as 2/4/8 GB FC or 10GE ... but does not have a full-blown L3 FC stack (as I expected). The only marketing intrusion I found in the documentation was the FCoE transit switch, a term which does not appear anywhere in the FC-BB-5 standard – obviously someone thought it sounds better than FIP snooping bridge.

Here’s what QFX3500 can do on the FCoE/FC front:

It can be a FIP-snooping bridge, forwarding FCoE traffic between its 10GE ports and protecting FCFs and FCoE Enodes with dynamically built ACLs based on FIP requests/responses. Bonus points for the documentation being very clear on the traffic flow: FCoE traffic has to go through a full-blown FCF.
It can be an FC N_port proxy (using NPIV) for FCoE Enodes. Yet again, as it doesn’t have the FCF code, the traffic between FCoE nodes has to go to a FC-based FCF and back (also clearly documented).
It supports exactly the same DCB standards as Cisco: PFC, ETS and DCBX, but not QCN. PFC and ETS should be enough unless you build really large bridged networks (hint: don’t).

In most cases, the functionality offered by QFX3500 is more than enough to implement access-layer convergence. If you use its NPIV functionality and send FCoE traffic from the servers straight into FC SAN, you can even maintain true SAN separation with servers attached to two QFX3500 switches. Transporting FCoE across the network toward an FCF (or another QFX3500 acting as NPIV proxy) that’s several hops away should still work, but it’s harder to maintain the strict SAN separation.

Am I allowed to conclude with a note to the vendors’ marketing departments? We might like you more if you tell us how your boxes actually work, what they can do and how we can build great solutions with them instead of constantly harassing us with the arguments why the things you haven’t implemented aren’t rosy, more so if you have a good product that can’t possibly benefit from such tactics ... not that I would ever expect them to listen.

More information

You’ll find more information about FCoE, FIP snooping, FC proxy solutions and Data Center Bridging (DCB) standards in my Data Center 3.0 for Networking Engineers webinar (buy a recording or a yearly subscription).

19 comments:

Anon 15 June 2011 08:18

Ridiculous. FCoE cannot be compared to TRILL at all and both address different needs.
FCoE addresses unifies I/O: specific to the costs of Nx by NICs plus Nx by HBA's and associated cabling costs:implied savings, while TRILL addresses bi-sectional bandwidth requirements at L2. Both have their perspective use cases. FCoE applicable if there is an existing investment in FC SAN, and TRILL applicable in use cases where massive, rather dynamic bandwidth requirements are demanded.

If anything, the debate should be iSCSI vs FCoE, and ECMP vs TRILL.

J Metz 15 June 2011 10:15

Nice job on the article, Ivan. Extremely succinct and well-written. This is the first time I've heard of "sparse mode" and "dense mode" FCoE, but it makes sense.

I do agree with your "Note to marketing departments" and I have been trying very hard to write *exactly* "how our boxes actually work, what they can do and how we can build great solutions with them." Trying to write for a broad audience usually precludes me from delving into white paper-type writing style, but if there's something that I miss or fail to accomplish in that goal let me know what I miss.

Now, if you'll excuse me, I need to get back to my Hawaiian holiday. :)

Ivan Pepelnjak 15 June 2011 14:43

Thank you - nice to hear you like it ;)

The "Note to marketing departments" was definitely not aimed at you. I won't name any names, but I'm positive everyone can make a few educated guesses based on my past rants.

Enjoy the holidays!!!

Pablo Carlier 15 June 2011 17:57

Hi Ivan,

thanks for the article, extremely clear and well put. Should I take from your analysis that Juniper's QFX3500 is a sparse-mode, non FCF capable switch? That was my idea before taking a look at their doc, and apparently this confirms this view.

For Juniper's QF strategy this does not seem so problematic (I still don't see how they will separate SAN Fabrics at the central nodes, but let's obviate that), but I find it challenging to imagine how could Juniper deploy a multi-hop FCoE network not QFabric based...

Ivan Pepelnjak 15 June 2011 19:23

You're absolutely right.

Brad Hedlund 15 June 2011 20:20

Since Juniper invited the criticism..
The QFX3500 FCoE NPV mode is handy if you are attaching rack servers directly to it.
But what about blade server enclosures with integrated FCoE switches?
Put the blade switch in NPV mode too? Nope. You cannot cascade NPV mode.
Therefore the only option I can see would be to have both the blade switch and QFX3500 put in transparent L2 FIP Snooping mode, thus creating the limiting and undesired "Sparse mode FCoE" desing. Ooops.
If the QFX3500 supported things like bridge port extenstion (VN-Tag, or Qbh), and FCoE FCF, then one could design a solid FCoE solution with blade servers, but unfortunately it does not and therefore lacks some much needed architecturaual flexibility with regards to access layer convergence.

Brad Hedlund 15 June 2011 23:17

Does QFabric add FCF capabilities? If not, how does an FCoE initiator talk to an FCoE target, each attached to QFabric?

Tjerk Bijlsma 15 June 2011 23:35

Hmm, interesting point Brad. I think that this "one big switch" QFabric without FCF functionality might not be very usefull if you want to actually build a NETWORK where hosts actually want to talk to storage. Without FCF functionality inside QFabric, you need an actual network to make QFabric useful for storage traffic.

Guess QFabric really is QEthernet

Jon Hudson 16 June 2011 04:37

Anon, totally agree except for a few minor things

1.) TRILL first is meant to be a replacement for STP. Bandwidth benefits are a product of that desire/work.

2.) TRILL and ECMP are very complimentary. Not adversarial.

Jon Hudson 16 June 2011 04:43

Well written as always!

What I don't understand is why;

Cisco can have TRILL based products and focus on why that is good.
Brocade can have TRILL based products and focus on why that is good.
(this list goes on and on)

Juniper seems to spend more time and energy pointing out how using TRILL will damn you to hell and cause the corruption of the morals of society than anything else.

I LOVE TRILL. I also see value in SPB. I don't bash SPB. I don't tell customers that if they run SPB or QFabric that they will turn into a Newt.

Why so angry Juniper?

Ivan Pepelnjak 16 June 2011 08:09

Because they won't have TRILL or SPB?

Personally, I think QFabric will incorporate some great ideas ... but they way they do their marketing is "somewhat extraordinary". Instead of telling us where they excel (and why and how), they bash everything else.

Ivan Pepelnjak 16 June 2011 08:14

I guess you'd put blade switch in FIP snooping mode, and QFX3500 in NPV mode. Remember another company having exactly the same design with N****4000 and N****5000 switches a while back? Ah, keep forgetting, that was before they had full-blown FCF stack everywhere :-E

Not saying dense-mode FCoE is not better, but even the above-mentioned design is not bad and you can easily maintain air-gap separation (if you truly want it - not sure why you'd be so rigorous apart from religious reasons) with proper VLAN assignments.

We also need to discuss once how dumber boxes (port extenders) add architectural flexibility :-P

Brad Hedlund 16 June 2011 16:38

With FIP Snooping at the blade switch I pitty the poor guy who gets stuck with the job of manually configure all the ENode MACs at the upstream NPV or FCF (yes, same case with N4K).

However, if the upstream NPV or FCF had port exenter capabilities, the blade switch would be a simple port extender and no FIP Snooping is required. Since every ENode in the blade chassis is now directly connected to the upstream NPV or FCF, there is no need to manually configure ENode MAC addresses.
The ability to extend the capabilities of the upstream device down to the servers provided the architectrual flexibility, and simplicity. :-)

Ivan Pepelnjak 16 June 2011 18:58

I never quite understood why someone would need to enter the Enode MAC addresses on the NPV/FCF. I know it's the limitation of NX-OS implementation of FCoE, but I fail to see it as an architectural requirement. What am I missing?

Brad Hedlund 17 June 2011 06:37

It's a simple matter of provisioning and applying policy. If the server (ENode) does not have it's own physical or virutal Ethernet interface on the FCF/NPV, (as would be the case with a FIP Snooping blade switch) how do you distinguish and differentiate one ENode from another? You cant make it interface based so you need to move up the stack and provision configuration and policy based on the MAC address.

Think about it this way, with a FIP Snooping blade switch how would you provision ENode-A in VSAN1 and ENode-B in VSAN2? How would you do that without needing to know the MAC address of each ahead of time? Good luck.

On the other hand, with a port extender (FEX) as the blade switch, each ENode has its own virtual Ethernet interface on the upstream FCF/NPV, so you can for example provision VSAN based on interfaces, without ever needing to care one bit about MAC addresses.

michal 19 June 2011 09:45

hello Ivan ,
thanks for the post and the follow-up discussions , the QFabric looks nice at different levels and the developers have interesting ideas . From a storage point of view it looks nice as long as you don't want to get ride from day one (two) of your native FC equipments ( most deployment scenarios will probably not ) as the NPV only storage won't work on it's own , so seems like a full blown FCoE only storage architecture ( without 3rd party native SAN ) with QFabrc is today not possible but certainly will be in the future.
The article title mentioned TRILL so from a protocol point of view apparently the mpls-like DCF is not supporting the TTL desired field and is relaying on the loop free Tree only.
At the end I think people will be especially focused on the 40GIG ( ethernet ??) uplinks capabilities :) as all the emerging protocols are promised tol be simply a hidden c-plane from a user point of view.

J Metz 21 June 2011 17:35

The additional issue to tag onto Brad's excellent explanation is the consequence of troubleshooting. Without that visibility you have a break in the ability to use standard fabric-based tools to troubleshoot problematic and troublesome hosts/CNAs and are required to use the tools available from the CNA vendor.

As you increase the number of ports (and the number of VMs behind those ports) this can become quite daunting. The N4k limits those number of ports to 16 server-facing, but IIRC some of the new switches from Juniper and others are 48-port FIP snooping switches. With VMs on the other end of those ports it can be an awful lot of manual hunting for problematic hosts.

Ivan Pepelnjak 21 June 2011 20:49

Definitely a great point. Need to write a new post about this particular aspect. Not sure many people get it - we're so used to dealing with the crazy mix of routing and bridging that we keep forgetting there are still some truly routed networks (= SAN).

BTW, you might want to write "SAN fabric" not just "fabric". Fabric with our without Q can mean anything today, including the material from which my 20-year-old Cisco T-shirt is made :-P

The MAC address problem Brad mentioned is even more intriguing. I had to go back to FC-BB-5 and really study it to figure out what the issue is. Definitely yet another post (Brad, you don't need to write them, I will ;) )

J Metz 22 June 2011 22:00

Oooooh. Good point re: "Fabric." Quite frankly, IMHO, we've all lost the plot about about the term - my own company included. =-O

Add comment

More information

Recent posts in the same categories

data center

SAN

workshop

19 comments: