Gartner has updated their networking hype cycle. Not surprisingly:
- Ethernet switching fabrics are on the slope of enlightenment (finally – we’ve been educating networking engineers on what they really are for half a decade);
- SDN is well on its way into the trough of disillusionment (shameless plug: I guess not enough people attended real-life SDN workshops) and whitebox switching is going the same way;
- SD-WAN is nearing the peak of inflated expectations;
- FCoE and Long-Distance vMotion will be dead before they reach the plateau.
A while ago Cisco added dynamic FCoE support to Nexus 5000 switches. It sounded interesting and I wanted to talk about it in my Data Center Fabrics update session, but I couldn’t find any documentation at that time.
In the meantime, the Configuring Dynamic FCoE Using FabricPath configuration guide appeared on Cisco’s web site and J Metz wrote a lengthly blog post explaining how it all works, triggering a severe attack of déjà vu.
One of my readers wanted to deploy FCoE on UCS in combination with Nexus 1000v and wondered how the FCoE traffic impacts QoS on Nexus 1000v. He wrote:
Let's say I want 4Gb for FCoE. Should I add bandwidth shares up to 60% in the nexus 1000v CBWFQ config so that 40% are in the default-class as 1kv is not aware of FCoE traffic? Or add up to 100% with the assumption that the 1kv knows there is only 6Gb left for network? Also, will the Nexus 1000v be able to detect contention on the uplink even if it doesn't see the FCoE traffic?
As always, things aren’t as simple as they look.
One of my regular readers sent me a long list of FCoE-related questions:
I wanted to get your thoughts around another topic – iSCSI vs. FCoE? Are there merits and business cases to moving to FCoE? Does FCoE deliver better performance in the end? Does FCoE make things easier or more complex?
He also made a very relevant remark: “Vendors that can support FCoE promote this over iSCSI; those that don’t have an FCoE solution say they aren’t seeing any growth in this area to warrant developing a solution”.
Chris Marget recently asked a really interesting question:
I've encountered an environment where the iSCSI networks are built just like FC networks: Multipathing software in use on servers and storage, switches dedicated to "SAN A" and "SAN B" VLANs, and full isolation of paths (redundant paths) between server and storage. I understand creating a dedicated iSCSI VLAN, but why would you need two? Isn’t the whole thing running on top of TCP? Am I missing something?
Well, it actually makes sense in some mission-critical environments.
Update 2015-12-06: Ethernet checksums are not a workaround for lack of iSCSI-level checksums. If your iSCSI solution doesn't support application-level checksum, your data might be at risk
Matt Thompson provided a really good answer to the “what’s acceptable oversubscription ratio in a ToR switch” when he wrote “I’m expecting a ‘how long is a piece of string’ answer” (note: do watch the BBC video answering that one).
There’s the 3:1 rule-of-thumb recipe, with a more realistic answer being “it depends”. Now let’s see if we can go beyond that without a deep dive into scholastic waters.
Just prior to Networking Field Day, the merry band of geeks sat down with Chip Copper, Brocade’s Solutioneer (a job title almost as good as Packet Herder) to discuss the intricate details of VCS Fabric. The videos are well worth watching – the technical details are interesting, but above all, Chip is a fantastic storyteller.
Should you use FC, FCoE or iSCSI when deploying new gear in your existing data center? What about Greenfield deployments? What are the decision criteria? Should you just skip iSCSI and use NFS if you’re focusing on server virtualization with VMware? Does it still make sense to build separate iSCSI network? Are jumbo frames useful? We’ll try to answer all these questions and a few more in the first Data Center Virtual Symposium sponsored by Cisco Systems.
Anyone serious about high-availability connects servers to the network with more than one uplink, more so when using converged network adapters (CNA) with FCoE. Losing all server connectivity after a single link failure simply doesn’t make sense.
If at all possible, you should use dynamic link aggregation with LACP to bundle the parallel server-to-switch links into a single aggregated link (also called bonded interface in Linux). In theory, it should be simple to combine FCoE with LAG – after all, FCoE runs on top of lossless Ethernet MAC service. In practice, there’s a huge difference between theory and practice.
J Michel Metz brought out an interesting aspect of the dense/sparse mode FCoE design dilemma in a comment to my FCoE over Trill ... this time from Juniper post: FC-focused troubleshooting. I have to mention that he happens to be working for a company that has the only dense-mode FCoE solution, but the comment does stand on its own.
Before reading this post you might want to read the definition of dense- and sparse-mode FCoE and a few more technical details.
A tweet from J Michel Metz has alerted me to a “Why TRILL won't work for data center network architecture” article by Anjan Venkatramani, Juniper’s VP of Product Management. Most of the long article could be condensed in two short sentences my readers are very familiar about: Bridging does not scale and TRILL does not solve the traffic trombone issues (hidden implication: QFabric will solve all your problems)... but the author couldn’t resist throwing “FCoE over TRILL” bone into the mix.
Intel announced its Open FCoE (software implementation of FCoE stack on top of Intel’s 10GB Ethernet adapters) using the cloudy bullshit bingo including simplifying the Data Center, Free New Technology, Cloud Vision and Green Computing (ok, they used Environmental impact) and lots of positive supporting quotes. The only thing missing was an enthusiastic Gartner quote (or maybe they were too expensive?).
Was anyone trying to sell you the “wonderful” idea of running FCoE between Data Centers instead of FC-over-DWDM or FCIP? Sounds great ... until you figure out it won’t work. Ever ... or at least until switch vendors drastically increase interface buffers on the 10GE ports.
FCoE requires lossless Ethernet between its “routers” (Fiber Channel Forwarders – see Multihop FCoE 101 for more details), which can only be provided with Data Center Bridging (DCB) standards, specifically Priority Flow Control (PFC). However, if you want to have lossless Ethernet between two points, every layer-2 (or higher) device in the path has to support DCB, which probably rules out any existing layer-2+ solution (including Carrier Ethernet, pseudowires, VPLS or OTV). The only option is thus bridging over dark fiber or a DWDM wavelength.
Just when I hoped we were finally getting somewhere with the FCoE/QCN discussion, Brocade managed to muddy the waters with its we-still-don’t-know-what-it-is announcement. Not surprisingly, networking consultants like my friend Greg Ferro of the Etherealmind fame responded to the shenanigan with statements like “FCoE ... is a technology so mindboggingly complicated that marketing people can argue over competing claims and all be correct.” Not true, the whole thing is exceedingly simple once you understand the architecture (and the marketing people always had competing claims).
Pretend for a minute that FC ≈ IP and LAN bridging ≈ Frame Relay, teleport into this parallel universe and allow me to tell you the whole story once again in more familiar terms.
One of the recurring religious FCoE-related debates of the last months is undoubtedly “do you need QCN to run FCoE” with Cisco adamantly claiming you don’t (hint: Nexus doesn’t support it) and HP claiming you do (hint: their switch software lacks FC stack) ... and then there’s this recent announcement from Brocade (more about it in a future post). As is usually the case, Cisco and HP are both right ... depending on how you design your multi-hop FCoE network.