FCoE over TRILL ... this time from Juniper

A tweet from J Michel Metz has alerted me to a “Why TRILL won't work for data center network architecture” article by Anjan Venkatramani, Juniper’s VP of Product Management. Most of the long article could be condensed in two short sentences my readers are very familiar about: Bridging does not scale and TRILL does not solve the traffic trombone issues (hidden implication: QFabric will solve all your problems)... but the author couldn’t resist throwing “FCoE over TRILL” bone into the mix.

I thought that bone has been scrapped clean in the last few months, but it seems some people still think it’s worth chewing. Let’s try to make it really simple. There are two fundamental ways of implementing multi-hop FCoE:

  • Dense-mode FCoE, where every LAN switch is also Fiber Channel Forwarder (FCF; equivalent to an IP router). FCoE frames are routed hop-by-hop and FCoE routing does not interact with STP or TRILL (because FCoE is not bridged). It’s also easy to implement SAN A/SAN B separation (two independent non-overlapping paths from each server to each storage device) as you control hop-by-hop flow of the FC traffic.
  • Sparse-mode FCoE, where LAN switches bridge FCoE frames like any L2 traffic and optionally provide FIP snooping. Because FCoE is bridged in this design, sparse-mode FCoE needs stable L2 transport which TRILL can provide more reliably than existing STP-based networks. Since you can’t control hop-by-hop FCoE traffic flow (it’s bridged according to whatever TRILL or STP thinks is the best path), it’s way harder to implement the SAN separation.

Got it? How hard was it? Do I really need to spell it out every few months?

Note to the equipment buyers: every time a vendor mentions “FCoE over TRILL” ask yourself “are they trying to divert my attention?” Most probably they don’t have full-blown FCF code in their switches.

Obviously I had to check QFX3500 JunOS documentation after reading the afore-mentioned article. As I came to expect from Juniper, the documentation is precise and well-written. QFX3500 supports FC and FCoE, has up to 12 “universal” ports (two groups of 6 ports) that can be configured as 2/4/8 GB FC or 10GE ... but does not have a full-blown L3 FC stack (as I expected). The only marketing intrusion I found in the documentation was the FCoE transit switch, a term which does not appear anywhere in the FC-BB-5 standard – obviously someone thought it sounds better than FIP snooping bridge.

Here’s what QFX3500 can do on the FCoE/FC front:

  • It can be a FIP-snooping bridge, forwarding FCoE traffic between its 10GE ports and protecting FCFs and FCoE Enodes with dynamically built ACLs based on FIP requests/responses. Bonus points for the documentation being very clear on the traffic flow: FCoE traffic has to go through a full-blown FCF.
  • It can be an FC N_port proxy (using NPIV) for FCoE Enodes. Yet again, as it doesn’t have the FCF code, the traffic between FCoE nodes has to go to a FC-based FCF and back (also clearly documented).
  • It supports exactly the same DCB standards as Cisco: PFC, ETS and DCBX, but not QCN. PFC and ETS should be enough unless you build really large bridged networks (hint: don’t).

In most cases, the functionality offered by QFX3500 is more than enough to implement access-layer convergence. If you use its NPIV functionality and send FCoE traffic from the servers straight into FC SAN, you can even maintain true SAN separation with servers attached to two QFX3500 switches. Transporting FCoE across the network toward an FCF (or another QFX3500 acting as NPIV proxy) that’s several hops away should still work, but it’s harder to maintain the strict SAN separation.

Am I allowed to conclude with a note to the vendors’ marketing departments? We might like you more if you tell us how your boxes actually work, what they can do and how we can build great solutions with them instead of constantly harassing us with the arguments why the things you haven’t implemented aren’t rosy, more so if you have a good product that can’t possibly benefit from such tactics ... not that I would ever expect them to listen.

More information

You’ll find more information about FCoE, FIP snooping, FC proxy solutions and Data Center Bridging (DCB) standards in my Data Center 3.0 for Networking Engineers webinar (buy a recording or a yearly subscription).

19 comments:

  1. Ridiculous. FCoE cannot be compared to TRILL at all and both address different needs.
    FCoE addresses unifies I/O: specific to the costs of Nx by NICs plus Nx by HBA's and associated cabling costs:implied savings, while TRILL addresses bi-sectional bandwidth requirements at L2. Both have their perspective use cases. FCoE applicable if there is an existing investment in FC SAN, and TRILL applicable in use cases where massive, rather dynamic bandwidth requirements are demanded.

    If anything, the debate should be iSCSI vs FCoE, and ECMP vs TRILL.

    ReplyDelete
  2. Nice job on the article, Ivan. Extremely succinct and well-written. This is the first time I've heard of "sparse mode" and "dense mode" FCoE, but it makes sense.

    I do agree with your "Note to marketing departments" and I have been trying very hard to write *exactly* "how our boxes actually work, what they can do and how we can build great solutions with them." Trying to write for a broad audience usually precludes me from delving into white paper-type writing style, but if there's something that I miss or fail to accomplish in that goal let me know what I miss.

    Now, if you'll excuse me, I need to get back to my Hawaiian holiday. :)

    ReplyDelete
  3. Ivan Pepelnjak15 June, 2011 14:43

    Thank you - nice to hear you like it ;)

    The "Note to marketing departments" was definitely not aimed at you. I won't name any names, but I'm positive everyone can make a few educated guesses based on my past rants.

    Enjoy the holidays!!!

    ReplyDelete
  4. Pablo Carlier15 June, 2011 17:57

    Hi Ivan,

    thanks for the article, extremely clear and well put. Should I take from your analysis that Juniper's QFX3500 is a sparse-mode, non FCF capable switch? That was my idea before taking a look at their doc, and apparently this confirms this view.

    For Juniper's QF strategy this does not seem so problematic (I still don't see how they will separate SAN Fabrics at the central nodes, but let's obviate that), but I find it challenging to imagine how could Juniper deploy a multi-hop FCoE network not QFabric based...

    ReplyDelete
  5. Ivan Pepelnjak15 June, 2011 19:23

    You're absolutely right.

    ReplyDelete
  6. Since Juniper invited the criticism..
    The QFX3500 FCoE NPV mode is handy if you are attaching rack servers directly to it.
    But what about blade server enclosures with integrated FCoE switches?
    Put the blade switch in NPV mode too? Nope. You cannot cascade NPV mode.
    Therefore the only option I can see would be to have both the blade switch and QFX3500 put in transparent L2 FIP Snooping mode, thus creating the limiting and undesired "Sparse mode FCoE" desing. Ooops.
    If the QFX3500 supported things like bridge port extenstion (VN-Tag, or Qbh), and FCoE FCF, then one could design a solid FCoE solution with blade servers, but unfortunately it does not and therefore lacks some much needed architecturaual flexibility with regards to access layer convergence.

    ReplyDelete
  7. Does QFabric add FCF capabilities? If not, how does an FCoE initiator talk to an FCoE target, each attached to QFabric?

    ReplyDelete
  8. Tjerk Bijlsma15 June, 2011 23:35

    Hmm, interesting point Brad. I think that this "one big switch" QFabric without FCF functionality might not be very usefull if you want to actually build a NETWORK where hosts actually want to talk to storage. Without FCF functionality inside QFabric, you need an actual network to make QFabric useful for storage traffic.

    Guess QFabric really is QEthernet

    ReplyDelete
  9. Anon, totally agree except for a few minor things

    1.) TRILL first is meant to be a replacement for STP. Bandwidth benefits are a product of that desire/work.

    2.) TRILL and ECMP are very complimentary. Not adversarial.

    ReplyDelete
  10. Well written as always!

    What I don't understand is why;

    Cisco can have TRILL based products and focus on why that is good.
    Brocade can have TRILL based products and focus on why that is good.
    (this list goes on and on)

    Juniper seems to spend more time and energy pointing out how using TRILL will damn you to hell and cause the corruption of the morals of society than anything else.

    I LOVE TRILL. I also see value in SPB. I don't bash SPB. I don't tell customers that if they run SPB or QFabric that they will turn into a Newt.

    Why so angry Juniper?

    ReplyDelete
  11. Ivan Pepelnjak16 June, 2011 08:09

    Because they won't have TRILL or SPB?

    Personally, I think QFabric will incorporate some great ideas ... but they way they do their marketing is "somewhat extraordinary". Instead of telling us where they excel (and why and how), they bash everything else.

    ReplyDelete
  12. Ivan Pepelnjak16 June, 2011 08:14

    I guess you'd put blade switch in FIP snooping mode, and QFX3500 in NPV mode. Remember another company having exactly the same design with N****4000 and N****5000 switches a while back? Ah, keep forgetting, that was before they had full-blown FCF stack everywhere :-E

    Not saying dense-mode FCoE is not better, but even the above-mentioned design is not bad and you can easily maintain air-gap separation (if you truly want it - not sure why you'd be so rigorous apart from religious reasons) with proper VLAN assignments.

    We also need to discuss once how dumber boxes (port extenders) add architectural flexibility :-P

    ReplyDelete
  13. With FIP Snooping at the blade switch I pitty the poor guy who gets stuck with the job of manually configure all the ENode MACs at the upstream NPV or FCF (yes, same case with N4K).

    However, if the upstream NPV or FCF had port exenter capabilities, the blade switch would be a simple port extender and no FIP Snooping is required. Since every ENode in the blade chassis is now directly connected to the upstream NPV or FCF, there is no need to manually configure ENode MAC addresses.
    The ability to extend the capabilities of the upstream device down to the servers provided the architectrual flexibility, and simplicity. :-)

    ReplyDelete
  14. Ivan Pepelnjak16 June, 2011 18:58

    I never quite understood why someone would need to enter the Enode MAC addresses on the NPV/FCF. I know it's the limitation of NX-OS implementation of FCoE, but I fail to see it as an architectural requirement. What am I missing?

    ReplyDelete
  15. It's a simple matter of provisioning and applying policy. If the server (ENode) does not have it's own physical or virutal Ethernet interface on the FCF/NPV, (as would be the case with a FIP Snooping blade switch) how do you distinguish and differentiate one ENode from another? You cant make it interface based so you need to move up the stack and provision configuration and policy based on the MAC address.

    Think about it this way, with a FIP Snooping blade switch how would you provision ENode-A in VSAN1 and ENode-B in VSAN2? How would you do that without needing to know the MAC address of each ahead of time? Good luck.

    On the other hand, with a port extender (FEX) as the blade switch, each ENode has its own virtual Ethernet interface on the upstream FCF/NPV, so you can for example provision VSAN based on interfaces, without ever needing to care one bit about MAC addresses.

    ReplyDelete
  16. hello Ivan ,
    thanks for the post and the follow-up discussions , the QFabric looks nice at different levels and the developers have interesting ideas . From a storage point of view it looks nice as long as you don't want to get ride from day one (two) of your native FC equipments ( most deployment scenarios will probably not ) as the NPV only storage won't work on it's own , so seems like a full blown FCoE only storage architecture ( without 3rd party native SAN ) with QFabrc is today not possible but certainly will be in the future.
    The article title mentioned TRILL so from a protocol point of view apparently the mpls-like DCF is not supporting the TTL desired field and is relaying on the loop free Tree only.
    At the end I think people will be especially focused on the 40GIG ( ethernet ??) uplinks capabilities :) as all the emerging protocols are promised tol be simply a hidden c-plane from a user point of view.

    ReplyDelete
  17. The additional issue to tag onto Brad's excellent explanation is the consequence of troubleshooting. Without that visibility you have a break in the ability to use standard fabric-based tools to troubleshoot problematic and troublesome hosts/CNAs and are required to use the tools available from the CNA vendor.

    As you increase the number of ports (and the number of VMs behind those ports) this can become quite daunting. The N4k limits those number of ports to 16 server-facing, but IIRC some of the new switches from Juniper and others are 48-port FIP snooping switches. With VMs on the other end of those ports it can be an awful lot of manual hunting for problematic hosts.

    ReplyDelete
  18. Ivan Pepelnjak21 June, 2011 20:49

    Definitely a great point. Need to write a new post about this particular aspect. Not sure many people get it - we're so used to dealing with the crazy mix of routing and bridging that we keep forgetting there are still some truly routed networks (= SAN).

    BTW, you might want to write "SAN fabric" not just "fabric". Fabric with our without Q can mean anything today, including the material from which my 20-year-old Cisco T-shirt is made :-P

    The MAC address problem Brad mentioned is even more intriguing. I had to go back to FC-BB-5 and really study it to figure out what the issue is. Definitely yet another post (Brad, you don't need to write them, I will ;) )

    ReplyDelete
  19. Oooooh. Good point re: "Fabric." Quite frankly, IMHO, we've all lost the plot about about the term - my own company included. =-O

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.