FCoE over TRILL ... this time from Juniper
A tweet from J Michel Metz has alerted me to a “Why TRILL won't work for data center network architecture” article by Anjan Venkatramani, Juniper’s VP of Product Management. Most of the long article could be condensed in two short sentences my readers are very familiar about: Bridging does not scale and TRILL does not solve the traffic trombone issues (hidden implication: QFabric will solve all your problems)... but the author couldn’t resist throwing “FCoE over TRILL” bone into the mix.
I thought that bone has been scrapped clean in the last few months, but it seems some people still think it’s worth chewing. Let’s try to make it really simple. There are two fundamental ways of implementing multi-hop FCoE:
- Dense-mode FCoE, where every LAN switch is also Fiber Channel Forwarder (FCF; equivalent to an IP router). FCoE frames are routed hop-by-hop and FCoE routing does not interact with STP or TRILL (because FCoE is not bridged). It’s also easy to implement SAN A/SAN B separation (two independent non-overlapping paths from each server to each storage device) as you control hop-by-hop flow of the FC traffic.
- Sparse-mode FCoE, where LAN switches bridge FCoE frames like any L2 traffic and optionally provide FIP snooping. Because FCoE is bridged in this design, sparse-mode FCoE needs stable L2 transport which TRILL can provide more reliably than existing STP-based networks. Since you can’t control hop-by-hop FCoE traffic flow (it’s bridged according to whatever TRILL or STP thinks is the best path), it’s way harder to implement the SAN separation.
Got it? How hard was it? Do I really need to spell it out every few months?
Note to the equipment buyers: every time a vendor mentions “FCoE over TRILL” ask yourself “are they trying to divert my attention?” Most probably they don’t have full-blown FCF code in their switches.
Obviously I had to check QFX3500 JunOS documentation after reading the afore-mentioned article. As I came to expect from Juniper, the documentation is precise and well-written. QFX3500 supports FC and FCoE, has up to 12 “universal” ports (two groups of 6 ports) that can be configured as 2/4/8 GB FC or 10GE ... but does not have a full-blown L3 FC stack (as I expected). The only marketing intrusion I found in the documentation was the FCoE transit switch, a term which does not appear anywhere in the FC-BB-5 standard – obviously someone thought it sounds better than FIP snooping bridge.
Here’s what QFX3500 can do on the FCoE/FC front:
- It can be a FIP-snooping bridge, forwarding FCoE traffic between its 10GE ports and protecting FCFs and FCoE Enodes with dynamically built ACLs based on FIP requests/responses. Bonus points for the documentation being very clear on the traffic flow: FCoE traffic has to go through a full-blown FCF.
- It can be an FC N_port proxy (using NPIV) for FCoE Enodes. Yet again, as it doesn’t have the FCF code, the traffic between FCoE nodes has to go to a FC-based FCF and back (also clearly documented).
- It supports exactly the same DCB standards as Cisco: PFC, ETS and DCBX, but not QCN. PFC and ETS should be enough unless you build really large bridged networks (hint: don’t).
In most cases, the functionality offered by QFX3500 is more than enough to implement access-layer convergence. If you use its NPIV functionality and send FCoE traffic from the servers straight into FC SAN, you can even maintain true SAN separation with servers attached to two QFX3500 switches. Transporting FCoE across the network toward an FCF (or another QFX3500 acting as NPIV proxy) that’s several hops away should still work, but it’s harder to maintain the strict SAN separation.
Am I allowed to conclude with a note to the vendors’ marketing departments? We might like you more if you tell us how your boxes actually work, what they can do and how we can build great solutions with them instead of constantly harassing us with the arguments why the things you haven’t implemented aren’t rosy, more so if you have a good product that can’t possibly benefit from such tactics ... not that I would ever expect them to listen.
More information
You’ll find more information about FCoE, FIP snooping, FC proxy solutions and Data Center Bridging (DCB) standards in my Data Center 3.0 for Networking Engineers webinar (buy a recording or a yearly subscription).
FCoE addresses unifies I/O: specific to the costs of Nx by NICs plus Nx by HBA's and associated cabling costs:implied savings, while TRILL addresses bi-sectional bandwidth requirements at L2. Both have their perspective use cases. FCoE applicable if there is an existing investment in FC SAN, and TRILL applicable in use cases where massive, rather dynamic bandwidth requirements are demanded.
If anything, the debate should be iSCSI vs FCoE, and ECMP vs TRILL.
I do agree with your "Note to marketing departments" and I have been trying very hard to write *exactly* "how our boxes actually work, what they can do and how we can build great solutions with them." Trying to write for a broad audience usually precludes me from delving into white paper-type writing style, but if there's something that I miss or fail to accomplish in that goal let me know what I miss.
Now, if you'll excuse me, I need to get back to my Hawaiian holiday. :)
The "Note to marketing departments" was definitely not aimed at you. I won't name any names, but I'm positive everyone can make a few educated guesses based on my past rants.
Enjoy the holidays!!!
thanks for the article, extremely clear and well put. Should I take from your analysis that Juniper's QFX3500 is a sparse-mode, non FCF capable switch? That was my idea before taking a look at their doc, and apparently this confirms this view.
For Juniper's QF strategy this does not seem so problematic (I still don't see how they will separate SAN Fabrics at the central nodes, but let's obviate that), but I find it challenging to imagine how could Juniper deploy a multi-hop FCoE network not QFabric based...
The QFX3500 FCoE NPV mode is handy if you are attaching rack servers directly to it.
But what about blade server enclosures with integrated FCoE switches?
Put the blade switch in NPV mode too? Nope. You cannot cascade NPV mode.
Therefore the only option I can see would be to have both the blade switch and QFX3500 put in transparent L2 FIP Snooping mode, thus creating the limiting and undesired "Sparse mode FCoE" desing. Ooops.
If the QFX3500 supported things like bridge port extenstion (VN-Tag, or Qbh), and FCoE FCF, then one could design a solid FCoE solution with blade servers, but unfortunately it does not and therefore lacks some much needed architecturaual flexibility with regards to access layer convergence.
Guess QFabric really is QEthernet
1.) TRILL first is meant to be a replacement for STP. Bandwidth benefits are a product of that desire/work.
2.) TRILL and ECMP are very complimentary. Not adversarial.
What I don't understand is why;
Cisco can have TRILL based products and focus on why that is good.
Brocade can have TRILL based products and focus on why that is good.
(this list goes on and on)
Juniper seems to spend more time and energy pointing out how using TRILL will damn you to hell and cause the corruption of the morals of society than anything else.
I LOVE TRILL. I also see value in SPB. I don't bash SPB. I don't tell customers that if they run SPB or QFabric that they will turn into a Newt.
Why so angry Juniper?
Personally, I think QFabric will incorporate some great ideas ... but they way they do their marketing is "somewhat extraordinary". Instead of telling us where they excel (and why and how), they bash everything else.
Not saying dense-mode FCoE is not better, but even the above-mentioned design is not bad and you can easily maintain air-gap separation (if you truly want it - not sure why you'd be so rigorous apart from religious reasons) with proper VLAN assignments.
We also need to discuss once how dumber boxes (port extenders) add architectural flexibility :-P
However, if the upstream NPV or FCF had port exenter capabilities, the blade switch would be a simple port extender and no FIP Snooping is required. Since every ENode in the blade chassis is now directly connected to the upstream NPV or FCF, there is no need to manually configure ENode MAC addresses.
The ability to extend the capabilities of the upstream device down to the servers provided the architectrual flexibility, and simplicity. :-)
Think about it this way, with a FIP Snooping blade switch how would you provision ENode-A in VSAN1 and ENode-B in VSAN2? How would you do that without needing to know the MAC address of each ahead of time? Good luck.
On the other hand, with a port extender (FEX) as the blade switch, each ENode has its own virtual Ethernet interface on the upstream FCF/NPV, so you can for example provision VSAN based on interfaces, without ever needing to care one bit about MAC addresses.
thanks for the post and the follow-up discussions , the QFabric looks nice at different levels and the developers have interesting ideas . From a storage point of view it looks nice as long as you don't want to get ride from day one (two) of your native FC equipments ( most deployment scenarios will probably not ) as the NPV only storage won't work on it's own , so seems like a full blown FCoE only storage architecture ( without 3rd party native SAN ) with QFabrc is today not possible but certainly will be in the future.
The article title mentioned TRILL so from a protocol point of view apparently the mpls-like DCF is not supporting the TTL desired field and is relaying on the loop free Tree only.
At the end I think people will be especially focused on the 40GIG ( ethernet ??) uplinks capabilities :) as all the emerging protocols are promised tol be simply a hidden c-plane from a user point of view.
As you increase the number of ports (and the number of VMs behind those ports) this can become quite daunting. The N4k limits those number of ports to 16 server-facing, but IIRC some of the new switches from Juniper and others are 48-port FIP snooping switches. With VMs on the other end of those ports it can be an awful lot of manual hunting for problematic hosts.
BTW, you might want to write "SAN fabric" not just "fabric". Fabric with our without Q can mean anything today, including the material from which my 20-year-old Cisco T-shirt is made :-P
The MAC address problem Brad mentioned is even more intriguing. I had to go back to FC-BB-5 and really study it to figure out what the issue is. Definitely yet another post (Brad, you don't need to write them, I will ;) )