The beauties of dense-mode FCoE
J Michel Metz brought out an interesting aspect of the dense/sparse mode FCoE design dilemma in a comment to my FCoE over Trill ... this time from Juniper post: FC-focused troubleshooting. I have to mention that he happens to be working for a company that has the only dense-mode FCoE solution, but the comment does stand on its own.
Before reading this post you might want to read the definition of dense- and sparse-mode FCoE and a few more technical details.
Let’s try to look at his comment from a router jockey perspective. Traditional SAN network is a pure routed network. Every FC switch is actually a layer-3 forwarding device. Supposedly there are all sorts of FC monitoring and troubleshooting mechanisms that make life of a SAN administrator easier (or so I am led to believe by FC fans whenever I mention iSCSI). An approximate equivalent in the IP world would be a purely router-based network with, for example, SONET/SDH links between routers (I wanted to say leased lines but I guess quite a few people would stop reading thinking this is another one of my last-millennium rants).
Replacing FC with dense-mode FCoE is like replacing OC-12/STM-4 with DWDM+GE in your router network. The gear gets a bit cheaper, you lose a bit of the functionality (remote alarms, for example), but you still control the whole network. You have hop-by-hop visibility and can do traceroute-like troubleshooting; you can also see all interface counters and are able to evaluate the reliability of every link. In a nutshell, apart from reducing the costs of the L1/L2 technology (and gaining some extra bandwidth in the process), not much has changed and all your troubleshooting tools and processes still work.
The drawbacks of dense-mode FCoE? Nothing has changed. You still have a SAN network we all love to hate, but it’s using 10GE instead of 8G FC. Every LAN switch is a fiber channel switch (exploding your FC domain), FSPF is running everywhere, and every topology change causes two SPF runs (one for FSPF, one for IS-IS if you run TRILL/FabricPath or OSPF if you don’t believe in flat-earth myths). Unless your FC gear supports the FC standards to their maximum limits, you will have to implement N_port proxy (with NPIV) on the FCoE/FC edge to protect the poor innocent FC switches that have never seen planets larger than a few switches. You do know the theoretical limit is 239 switches per SAN network (including all your LAN FCoE gear), right?
Sparse-mode FCoE with FIP-snooping switches is like replacing your SONET/SDH infrastructure with VPLS. You get a misty blob between your routers that you cannot troubleshoot with your existing tools. Interface counters are no longer relevant, because you see only half of the truth on the access link (inbound, but not outbound errors) and have zero visibility into transit links. Hop-by-hop troubleshooting no longer gives exact answers, as the whole blob that sits in the middle of your network (VPLS cloud) behaves like a single hop. Decisions made by routing protocols are somewhat useless, as you never know how many hops there are in the VPLS cloud, what its true bandwidth and latency is, and thus what the cost of the VPLS subnet should be. In fact, routing protocols change from path-computation tools into reachability-distribution tools. Did I mention that you can’t control the convergence speed anymore because most of the failures happen within the blob that you have no influence on?
If you were ever forced to make a migration from a router-only WAN network with point-to-point links to a VPN-based network (be it Frame Relay, L3 MPLS/VPN, VPLS or something-over-Internet), you’re probably acutely aware of the frustrations caused by the loss of control and troubleshooting capabilities. That’s how your SAN administrators might feel if you implement sparse-mode FCoE. The choice is yours.
More information
If you want to learn more about modern data center architectures, including FCoE, Data Center Bridging and TRILL, buy a recording of my Data Center 3.0 for Networking Engineers webinar. To learn more about benefits and drawbacks of individual WAN VPN solutions, consider my Choose the Optimal VPN Service webinar. Both webinars are also part of the yearly subscription package.
Ethernet OAM will... get... there... Eventually... ;)
Ethernet OAM will... get... there... Eventually... ;)
1: Dense mode FCoE dramatically increases switch configuration, we'll have to see but right now it looks like it'll be to the point where 1+1=3 in terms of total complexity.
2: I'd have to replace most of the switches in my network to make it happen. Cat6K probably won't ever do FCoE, I think because it doesn't know how not to kill frames.
3: I can't buy a solution right now that does dense mode FCoE. Only the 5Ks from Cisco support it that I know of, maybe the MDS. 7Ks will at some point soon (4.2 isn't out yet is it?).
4: If I did iSCSI for block-level storage, every switch in my network would potentially be able to pull it off.
I'm fully willing to admit I just may be wrong about FCoE, but the first impression is not so hot. Did I miss something?
If you need FCoE because you need to integrate new servers with existing FC storage (common scenario, we are exactly in the same bind), it makes sense to go into FC world ASAP ... if you already have FC throughout the data center and if you have enough FC ports (buying new FC ports makes as much sense in my opinion as investing in SNA routers).
And then there are those rare scenarios where the FC part of your DC is "far" away from the servers and you really need FC capabilities (don't ask me why, but some gear still works over FC only) and you'd prefer to extend FCoE not FC ... and this is the niche where the multihop FCoE design debates make sense.
As for Cat6K - I wouldn't hold my breath. FCoE is only in the NX-OS code and probably won't be ported to IOS any time soon.
Fibre Channel is switching. Which is why without zoning and lun masks every initiator can talk to every target. This is how a single Windows could write over everyone’s boot record in the old days.
Now, there is also FC Routing. This is done if you want a target or initiator on different fabrics to talk to one another without merging the fabrics.
A single flat switched network with 9,000+ ports is exactly what makes FC impressive =)
With that said, you can decode iSCSI with with wireshark. Where as taking a trace for FC requires a very very expensive analyzer from someone like Finisar Network Tools (now owned by JDSU). So doing development should be much much cheaper
* Are L2 headers rewritten on every hop?
* Does it have TTL that is decremented on each hop?
* Related to the previous one - does traceroute work
I already know FCF has two router-like functions: it does communicate with the end hosts (bridges don't) and it runs a routing protocol (which makes TRILL almost-routing of MAC frames).
As for what HP is pushing, read this http://searchnetworking.techtarget.com/news/2240037298/HP-Discover-Wheres-the-core-networking-evolution 8-)
I think we are just coming from different definitions of routing. To answer your questions
1.) the switch_id changes, the s_id and d_id do not.
2.) there isn't a TTL per se. However there is a hop count type construct.
3.) No traceroute in FC. However this is unique to FCP/FC and not necessarily FCP/DCB (FCoE)
The reason I don't see it as routing is because without zoning every initiator talks to every target and you just have one big flat mesh network. Then you overlay zones which are similar to VPNs or VLANs in that they ensure that broadcast items such as RSCNs are only seen by the members of that zone.
There IS FC routing, is that they only way to get devices in one Fabric to talk to another Fabric without merging the Fabrics is to route between them using EX ports and usually something like FCIP.
With that said, FCSW uses FSPF which while operating at the equivalent of layer2 (as with IS-IS-L2) it is by definition a "routing" protocol, so one could easily argue that L2/3 be damed, routing is routing.
I will point out however that in Cisco's own book "Storage Networking Protocol Fundamentals" by Jason Long http://books.google.com/books?id=zEFrqPrcZI8C&lpg=PA368&ots=3iCnK05DRO&dq=does%20fc-sw%20have%20TTLs&pg=PP1#v=onepage&q&f=false
He states "FC is a switching technology, and FC addresses are Layer 2 constructs. Therefore, FC switches do not "route" frames according to the traditional definition of routing."
So I should not have jump so quick to say FC is by no means routing. I just find when teaching FC that showing FCSW to be switching and then connecting two fabrics together so that frames can route to another as routing makes for clearer understanding, especially since most see anything at L2 as switching.
However just as many have referred to ATM as being L2.5, TRILL as being layer 2.5 and since it uses FSPF one can certainly argue that FC has routing elements.
It really just depends on if you define routing by behavior or by layer
Thanks for the comment (writing the reply from an easy chair in an Internet cafe 8-) )
#1 - I assume you're saying the source and destination addresses don't change when the FC frame traverses the network. Same as IP, still routing
#2 - could you point me in the right direction to investigate the hop count thing?
#3 - There's a traceroute-like command defined in FC-SW, but it does not work like IP traceroute (you could say it's part of OAM functionality)
Connectivity does not imply forwarding functionality. In an Internet with no firewalls/packet filters every host can talk to every other host, but it's still a routed network.
The L2/L3 confusion probably arises from direct comparison of FC frame format with Ethernet/IP frame format. Just because Ethernet/IP combo has MAC+IP addresses does not mean that any protocol that has a single address in its packets automatically becomes layer-2. As I pointed out, PPP/IP combo has only IP addresses (well, there is a fixed-value byte in PPP header that's a leftover from SDLC days), but that does not mean IP is L2 protocol when running over PPP.
Cisco Press books are good, but should not be used as an absolutely error-free authoritative reference. I prefer to rely on the standards that define a particular technology.
Last but definitely not least, will study the inter-fabric routing :-P
A factor working against HP Virtual Connect and for Cisco Nexus is the Sparse/Dense mode capability of each product.The SAN guys, naturally, want Dense Mode support, and I agree with them.
HP continues to insist that Virtual Connect is not a switch and continues to target the server guys in their marketecture push (I currently work in the Server team that manages all our HP infrastructure, but I have a Cisco background).
Every time this has come up in conversation I've told them it isn't the server guys you need to sell this to (we've already bought the benefits to server infrastructure the Virtual I/O features of VC brings). HP needs to sell their product to the Networking guys.
Unfortunately, not supporting Dense Mode FCoE is just one of many oversights that plagues the product. Virtual Connect is currently too little, too late. To make matters worse, HP aren't showing a solid roadmap for the product.
They aren't listening, and they aren't showing any leadership.
They need to make Virtual Connect a monster. They need to truly make it a compelling network product and sell it to the network guys.