QFabric Part 2 – Control Plane Overview
Like anyone else, I was pretty impressed with the QFabric hardware architecture when Juniper announced it, but remained way more interested in the control-plane aspects of QFabric. After all, if you want multiple switches to behave like a single device, you could either use Borg-like architecture with a single control plane entity, or implement some very clever tricks.
Nobody has yet demonstrated a 100-switch network with a single control plane (although the OpenFlow aficionados would make you believe it’s just around the corner), so it must have been something else.
Quick question: IP multicast over an existing IP backbone
Imagine you’d actually want to run VXLAN between two data centers (I wouldn’t but that’s beyond the point at this moment) and the only connectivity between the two is IP, no multicast. How would you implement IP multicast across a generic IP backbone? Anything goes, from duct tape (GRE) to creative solutions ... and don’t forget those pesky RPF checks.
QFabric Part 1 – Hardware Architecture
Juniper has finally released the technical documentation for the QFabric virtual switch and its components (QF/Node, QF/Interconnect and QF/Director). As expected, my speculations weren’t too far off – if anything, Juniper didn’t go far enough along those lines, but we’ll get there later.
The generic hardware architecture of the QFabric switching complex has been well known for quite a while (listening to the Juniper QFabric Packet Pushers Podcast is highly recommended) – here’s a brief summary:
NVGRE – because one standard just wouldn’t be enough
Two weeks after VXLAN (backed by VMware, Cisco, Citrix and Red Hat) was launched at VMworld, Microsoft, Intel, HP & Dell published NVGRE draft (Arista and Broadcom are cleverly sitting on both chairs) which solves the same problem in a slightly different way.
If you’re still wondering why we need VXLAN and NVGRE, read my VXLAN post (and the one describing how VXLAN, OTV and LISP fit together).
It’s obvious the NVGRE draft was a rushed affair, its only significant and original contribution to knowledge is the idea of using the lower 24 bits of the GRE key field to indicate the Tenant Network Identifier (but then, lesser ideas have been patented time and again). Like with VXLAN, most of the real problems are handwaved to other or future drafts.
Interesting links (2011-09-18)
The Mixed Feelings award of the week goes to Doug Gourlay and his Why FCoE is Dead, But Not Buried Yet article. While I agree with everything he’s saying about L2 and L3, the FCoE part of the post is shaky enough to generate tons of comments (or maybe that was the goal). For a hilarious perspective on the same topic, read Fiber Channel and Ethernet – the odd couple.
And here are the other great articles I stumbled upon during the last few days:
Responsible Generation of BGP Default Route
Chris sent me the following question a while ago:
I've got a full Internet BGP table, and want to [responsibly]{.emphasis} send a default route to a downstream AS. It's the "responsibly" part that's got me frustrated: How can I judge whether the internet is working and make the origination of the default conditional on that?
He’d already figured out the neighbor default-originate route-map command, but wanted to check for more generic conditions than the presence of one or more prefixes in the IP routing table.
Changing configuration with EEM – yes or no?
Daniel left a very relevant comment to my convoluted BGP session shutdown solution:
What I am currently doing is using EEM to watch my tracked objects and then issuing a neighbor shutdown command. Is there a functional reason I would not want to do it that way, and use the method you prescribe?
As always, the answer is “it depends.” In this case, the question to ask yourself is: “do I track configuration changes and react to them?”
OSPF-over-DMVPN Using Two Hub Routers
One of my readers sent me the following question a few days ago:
Do you have a webinar that covers Dual DMVPN HUB deployment using OSPF? If so, which webinar covers it?
I told him that the DMVPN: From Basics to Scalable Networks webinar covers exactly that scenario (and numerous others), describing both Phase 1 DMVPN and Phase 2 DMVPN design and implementation guidelines. Interestingly, he replied that the information on this topic seems to be very scant:
You Don’t Need OpenFlow to Solve Every Age-Old Problem
I read two great blog posts on Sunday: evergreen Fallacies of Distributed Computing from Bob Plankers and forward-looking Understanding Hadoop Clusters and the Network from Brad Hedlund. Read them both before continuing (they are both great reads) and try to figure out why I’m mentioning them in the same sentence (no, it’s not the fact that Hadoop uses distributed computing).
Long-distance IRF Fabric: Works Best in PowerPoint
HP has commissioned an IRF network test that came to absolutely astonishing conclusions: vMotion runs almost twice as fast across two links bundled in a port channel than across a single link (with the other one being blocked by STP). The test report contains one other gem, this one a result of incredible creativity of HP marketing:
For disaster recovery, switches within an IRF domain can be deployed across multiple data centers. According to HP, a single IRF domain can link switches up to 70 kilometers (43.5 miles) apart.
You know my opinions about stretched cluster… and the more down-to-earth part of HP Networking (the people writing the documentation) agrees with me.
Shut Down BGP Session Based on Tracked Object
In responses to my The Road to Complex Designs is Paved With Great Recipes post Daniel suggested shutting down EBGP session if your BGP router cannot reach the DMZ firewall and Cristoph guessed that it might be done without changing the router configuration with the neighbor fall-over route-map BGP configuration command. He was sort-of right, but the solution is slightly more convoluted than he imagined.
IPv6 MPLS/VPN (6VPE) with PPPoE and RADIUS
During my visit to South Africa someone told me that he got 6VPE working over an L2TP connection ... and that you should “use the other VRF attribute, not lcp:interface-config” to make it work. A few days ago one of the readers asked me the same question and although I was able to find several relevant documents, I wanted to see it working in my lab.
Large-Scale Bridging = Nuked Earth
If you’re not working for a data center fabric vendor, you’ll probably enjoy the excellent analogy Ethan Banks made after reading my TRILL-over-WAN post:
Think of a network topology like a road map. There's boulevards, major junction points, highways, dead ends, etc. Now imagine what that map looks like after it's been nuked from orbit: flat. Sure, we blew up the world, but you can go in a straight line anywhere you want.
... and don’t forget to be nice to the people asking for inter-DC VM mobility ;)
Nexus 1000V LACP offload and the dangers of in-band control
A while ago someone sent me the following comment as part of a lengthy discussion focusing on Nexus 1000V: “My SE tells me that the latest 1000V release has rewritten the LACP code so that it operates entirely within the VEM. VSM will be out of the picture for LACP negotiations. I guess there have been problems.”
If you’re not convinced you should be running LACP between the ESX hosts and the physical switches, read this one (and this one). Ready? Let’s go.
TRILL goes to WAN – the bridging craze continues
Remember how I foretold when TRILL first appeared that someone would be “brave” enough to reinvent WAN bridging and brouters that we so loved to hate in the early 90’s? The new wave of the WAN bridging craze has started: RFC 6361 defines TRILL over PPP (because bridging-over-PPP is just not good enough). Just because you can doesn’t mean you should.