Blog Posts in September 2011
The proponents of inter-DC layer-2 connectivity (required by long-distance vMotion) inevitably cite disaster avoidance (along with buzzword-bingo-winning business agility) as one of the primary requirements after they figure out stretched clusters might not be such a good idea (and there’s no way to explain the dangers of split subnets to some people). When faced with the disaster avoidance “requirement”, ask them to do some basic math first.
Got the following question with an invalid return address, so I’m broadcasting the reply ;)
I am running a DMVPN network and recently got a requirement for spoke-to-spoke communication. We currently shape traffic on a per spoke basis on the hub, and have a single shaper at the remote site. However, if a spoke is receiving a large amount of traffic from the hub and another spoke site, how will the sites sending traffic know that the remote port is congested?
Short answer – they won’t. You have a mission-impossible problem (very similar to ADSL QoS), but there might be some slight silver lining:
You won’t find much about the QFabric forwarding architecture and resulting behavior in the documentation; white papers might give you more insight and I’m positive more detailed ones will start appearing on Juniper’s web site now that the product is shipping. In the meantime, let’s see how far we can get based on two simple assumptions: (A) The "one tier architecture" claim is true and (B) Juniper has some very smart engineers.
Just a few hours after VXLAN was launched, I received an e-mail from one of my readers asking (literally) if VXLAN was awesome or braindead. I decided to answer this question (you know the right answer is it depends) and a few others in a FastPacket blog post published by SearchNetworking.
I wrote the post before NVGRE was published and missed the “brilliant” idea of using GRE key as virtual segment ID.
Occasionally my readers ask me if I would be available for a consulting/design project (or send me questions that are actually design review/second opinion challenges). I usually recommend using our Professional Services team for larger projects (I try to do only a few larger consulting projects per year ... but that shouldn’t stop you from asking ;), but quite often the amount of work involved is so low that it simply doesn’t make sense to go through all the paperwork nightmare and I decided to create ExpertExpress service to address those cases.
I’ll be in Krakow for the PLNOG/EuroNOG conferences Wednesday through Friday. This is not the primary reason I’m arriving on Wednesday (although it does look tempting) – I wanted to have enough time for discussions with fellow networking engineers and Thursday afternoon/Friday will probably be pretty busy. So, if you’d like to chat with me about exciting networking technologies, just find me in the crowd (unfortunately I won’t be wearing this T-shirt) ... and if you’d like to have a more serious (and longer) discussion, get in touch with me or send me a tweet.
Like anyone else, I was pretty impressed with the QFabric hardware architecture when Juniper announced it, but remained way more interested in the control-plane aspects of QFabric. After all, if you want multiple switches to behave like a single device, you could either use Borg-like architecture with a single control plane entity, or implement some very clever tricks.
Nobody has yet demonstrated a 100-switch network with a single control plane (although the OpenFlow aficionados would make you believe it’s just around the corner), so it must have been something else.
Imagine you’d actually want to run VXLAN between two data centers (I wouldn’t but that’s beyond the point at this moment) and the only connectivity between the two is IP, no multicast. How would you implement IP multicast across a generic IP backbone? Anything goes, from duct tape (GRE) to creative solutions ... and don’t forget those pesky RPF checks.
Comparing promises, deliverables and generic progress seems to be popular in the harvest season, so let’s see how far Cisco pushed the Data Center IPv6 support in the six months since my last status report.
Kudos to the Nexus 7000/NX-OS team for doing the right thing. Not only did they make me happy by implementing full-blown MPLS, MPLS/TE and MPLS/VPN, they included 6PE and 6VPE in the first release of the MPLS code. Great job!
Juniper has finally released the technical documentation for the QFabric virtual switch and its components (QF/Node, QF/Interconnect and QF/Director). As expected, my speculations weren’t too far off – if anything, Juniper didn’t go far enough along those lines, but we’ll get there later.
The generic hardware architecture of the QFabric switching complex has been well known for quite a while (listening to the Juniper QFabric Packet Pushers Podcast is highly recommended) – here’s a brief summary:
Two weeks after VXLAN (backed by VMware, Cisco, Citrix and Red Hat) was launched at VMworld, Microsoft, Intel, HP & Dell published NVGRE draft (Arista and Broadcom are cleverly sitting on both chairs) which solves the same problem in a slightly different way.
If you’re still wondering why we need VXLAN and NVGRE, read my VXLAN post (and the one describing how VXLAN, OTV and LISP fit together), register for the Introduction to Virtual Networking webinar or read the Introduction section of the NVGRE draft.
The Mixed Feelings award of the week goes to Doug Gourlay and his Why FCoE is Dead, But Not Buried Yet article. While I agree with everything he’s saying about L2 and L3, the FCoE part of the post is shaky enough to generate tons of comments (or maybe that was the goal). For a hilarious perspective on the same topic, read Fiber Channel and Ethernet – the odd couple.
And here are the other great articles I stumbled upon during the last few days:
Chris sent me the following question a while ago:
I've got a full Internet BGP table, and want to responsibly send a default route to a downstream AS. It's the "responsibly" part that's got me frustrated: How can I judge whether the internet is working and make the origination of the default conditional on that?
He’d already figured out the neighbor default-originate route-map command, but wanted to check for more generic conditions than the presence of one or more prefixes in the IP routing table.
Daniel left a very relevant comment to my convoluted BGP session shutdown solution:
What I am currently doing is using EEM to watch my tracked objects and then issuing a neighbor shutdown command. Is there a functional reason I would not want to do it that way, and use the method you prescribe?
As always, the answer is “it depends.” In this case, the question to ask yourself is: “do I track configuration changes and react to them?”
One of my readers sent me the following question a few days ago:
Do you have a webinar that covers Dual DMVPN HUB deployment using OSPF? If so which webinar covers it?
I told him that the DMVPN: From Basics to Scalable Networks webinar covers exactly that scenario (and numerous others), describing both Phase 1 DMVPN and Phase 2 DMVPN design and implementation guidelines. Interestingly, he replied that the information on this topic seems to be very scant:
I read two great blog posts on Sunday: evergreen Fallacies of Distributed Computing from Bob Plankers and forward-looking Understanding Hadoop Clusters and the Network from Brad Hedlund. Read them both before continuing (they are both great reads) and try to figure out why I’m mentioning them in the same sentence (no, it’s not the fact that Hadoop uses distributed computing).
HP has recently commissioned an IRF network test that came to absolutely astonishing conclusions: vMotion runs almost twice as fast across two links bundled in a port channel than across a single link (with the other one being blocked by STP). The test report contains one other gem, this one a result of incredible creativity of HP marketing:
For disaster recovery, switches within an IRF domain can be deployed across multiple data centers. According to HP, a single IRF domain can link switches up to 70 kilometers (43.5 miles) apart.
You know my opinions about stretched cluster ... and the more down-to-earth part of HP Networking (the people writing the documentation) agrees with me.
In responses to my The Road to Complex Designs is Paved With Great Recipes post Daniel suggested shutting down EBGP session if your BGP router cannot reach the DMZ firewall and Cristoph guessed that it might be done without changing the router configuration with the neighbor fall-over route-map BGP configuration command. He was sort-of right, but the solution is slightly more convoluted than he imagined.
When I started writing about VXLAN, I received a few tweets along the lines of “I have no clue what you’re writing about.” Here’s a chance to fix that: I’ll run an Introduction to Virtualized Networking webinar in early October (register), trying to demystify the acronyms and marketectures. It doesn’t assume you know anything about server virtualization or IaaS; we’ll start from scratch and cover as much ground as possible.
During my visit to South Africa someone told me that he got 6VPE working over an L2TP connection ... and that you should “use the other VRF attribute, not lcp:interface-config” to make it work. A few days ago one of the readers asked me the same question and although I was able to find several relevant documents, I wanted to see it working in my lab.
If you’re not working for a data center fabric vendor (in which case please read the other today’s post), you’ll probably enjoy the excellent analogy Ethan Banks made after reading my TRILL-over-WAN post:
Think of a network topology like a road map. There's boulevards, major junction points, highways, dead ends, etc. Now imagine what that map looks like after it's been nuked from orbit: flat. Sure, we blew up the world, but you can go in a straight line anywhere you want.
... and don’t forget to be nice to the people asking for inter-DC VM mobility ;)
So far my presentation covers Cisco’s Fabric Path, VPC, VSS and port extenders, Brocade’s VCS Fabric based on what’s available in Brocade NOS 2.0 (they still have to decide whether they’ll tell me what’s new in NOS 2.1), Juniper’s Virtual Chassis and XRE, HP’s IRF, and OpenFlow.
A while ago someone sent me the following comment as part of a lengthy discussion focusing on Nexus 1000V: “My SE tells me that the latest 1000V release has rewritten the LACP code so that it operates entirely within the VEM. VSM will be out of the picture for LACP negotiations. I guess there have been problems.”
If you’re not familiar with the Nexus 1000V architecture, read this post first. If you’re not convinced you should be running LACP between the ESX hosts and the physical switches, read this one (and this one). Ready? Let’s go.
Remember how I foretold when TRILL first appeared that someone would be “brave” enough to reinvent WAN bridging and brouters that we so loved to hate in the early 90’s? The new wave of the WAN bridging craze has started: RFC 6361 defines TRILL over PPP (because bridging-over-PPP is just not good enough). Just because you can doesn’t mean you should.
Immediately after VXLAN was announced @ VMworld, the twittersphere erupted in speculations and questions, many of them focusing on how VXLAN relates to OTV and LISP, and why we might need a new encapsulation method.
VXLAN, OTV and LISP are point solutions targeting different markets. VXLAN is an IaaS infrastructure solution, OTV is an enterprise L2 DCI solution and LISP is ... whatever you want it to be.