« ipSpace.net blog

Friday, September 30, 2011 06:43 CEST

Long-distance vMotion for Disaster Avoidance? Do the Math First

The proponents of inter-DC layer-2 connectivity (required by long-distance vMotion) inevitably cite disaster avoidance (along with buzzword-bingo-winning business agility) as one of the primary requirements after they figure out stretched clusters might not be such a good idea (and there’s no way to explain the dangers of split subnets to some people). When faced with the disaster avoidance “requirement”, ask them to do some basic math first.

DMVPN: Spoke QoS Challenge

Got the following question with an invalid return address, so I’m broadcasting the reply ;)

I am running a DMVPN network and recently got a requirement for spoke-to-spoke communication. We currently shape traffic on a per spoke basis on the hub, and have a single shaper at the remote site. However, if a spoke is receiving a large amount of traffic from the hub and another spoke site, how will the sites sending traffic know that the remote port is congested?

Short answer – they won’t. You have a mission-impossible problem (very similar to ADSL QoS), but there might be some slight silver lining:

QFabric Part 3 – Forwarding

2021-01-03: Even though QFabric was an interesting architecture (and reverse-engineering it was a fun intellectual exercise), it withered a few years ago. Looks like Juniper tried to bite off too much.

You won’t find much about the QFabric forwarding architecture and resulting behavior in the documentation; white papers might give you more insight and I’m positive more detailed ones will start appearing on Juniper’s web site now that the product is shipping. In the meantime, let’s see how far we can get based on two simple assumptions: (A) The "one tier architecture" claim is true and (B) Juniper has some very smart engineers.

VXLAN: awesome or braindead?

Just a few hours after VXLAN was launched, I received an e-mail from one of my readers asking (literally) if VXLAN was awesome or braindead. I decided to answer this question (you know the right answer is it depends) and a few others in a FastPacket blog post published by SearchNetworking.

I wrote the post before NVGRE was published and missed the “brilliant” idea of using GRE key as virtual segment ID.

ExpertExpress – Online Help When and Where You Need It Most

Occasionally my readers ask me if I would be available for a consulting/design project (or send me questions that are actually design review/second opinion challenges).

TL&DR: No… but I created ExpertExpress service in 2011 to address those cases.

How can you use it? Anything goes. We’ve been doing technology briefings, design reviews, router configurations, troubleshooting… Just make sure your problem is well-defined so we won’t spend time trying to figure out what the problem is.

Net::Beer in Krakow

I’ll be in Krakow for the PLNOG/EuroNOG conferences Wednesday through Friday. This is not the primary reason I’m arriving on Wednesday (although it does look tempting) – I wanted to have enough time for discussions with fellow networking engineers and Thursday afternoon/Friday will probably be pretty busy. So, if you’d like to chat with me about exciting networking technologies, just find me in the crowd (unfortunately I won’t be wearing this T-shirt) ... and if you’d like to have a more serious (and longer) discussion, get in touch with me or send me a tweet.

see 4 comments

Friday, September 23, 2011 06:58 CEST

QFabric Part 2 – Control Plane Overview

2021-01-03: Even though QFabric was an interesting architecture (and reverse-engineering it was a fun intellectual exercise), it withered a few years ago. Looks like Juniper tried to bite off too much.

Like anyone else, I was pretty impressed with the QFabric hardware architecture when Juniper announced it, but remained way more interested in the control-plane aspects of QFabric. After all, if you want multiple switches to behave like a single device, you could either use Borg-like architecture with a single control plane entity, or implement some very clever tricks.

Nobody has yet demonstrated a 100-switch network with a single control plane (although the OpenFlow aficionados would make you believe it’s just around the corner), so it must have been something else.

Quick question: IP multicast over an existing IP backbone

Imagine you’d actually want to run VXLAN between two data centers (I wouldn’t but that’s beyond the point at this moment) and the only connectivity between the two is IP, no multicast. How would you implement IP multicast across a generic IP backbone? Anything goes, from duct tape (GRE) to creative solutions ... and don’t forget those pesky RPF checks.

see 12 comments

Tuesday, September 20, 2011 06:41 CEST

QFabric Part 1 – Hardware Architecture

2021-01-03: Even though QFabric was an interesting architecture (and reverse-engineering it was a fun intellectual exercise), it withered a few years ago. Looks like Juniper tried to bite off too much.

Juniper has finally released the technical documentation for the QFabric virtual switch and its components (QF/Node, QF/Interconnect and QF/Director). As expected, my speculations weren’t too far off – if anything, Juniper didn’t go far enough along those lines, but we’ll get there later.

The generic hardware architecture of the QFabric switching complex has been well known for quite a while (listening to the Juniper QFabric Packet Pushers Podcast is highly recommended) – here’s a brief summary:

NVGRE – because one standard just wouldn’t be enough

2021-01-03: Looks like NVGRE died – even Microsoft walked away. There are tons of VXLAN implementations though. VMware and AWS are also using Geneve.

Two weeks after VXLAN (backed by VMware, Cisco, Citrix and Red Hat) was launched at VMworld, Microsoft, Intel, HP & Dell published NVGRE draft (Arista and Broadcom are cleverly sitting on both chairs) which solves the same problem in a slightly different way.

If you’re still wondering why we need VXLAN and NVGRE, read my VXLAN post (and the one describing how VXLAN, OTV and LISP fit together).

It’s obvious the NVGRE draft was a rushed affair, its only significant and original contribution to knowledge is the idea of using the lower 24 bits of the GRE key field to indicate the Tenant Network Identifier (but then, lesser ideas have been patented time and again). Like with VXLAN, most of the real problems are handwaved to other or future drafts.

Interesting links (2011-09-18)

The Mixed Feelings award of the week goes to Doug Gourlay and his Why FCoE is Dead, But Not Buried Yet article. While I agree with everything he’s saying about L2 and L3, the FCoE part of the post is shaky enough to generate tons of comments (or maybe that was the goal). For a hilarious perspective on the same topic, read Fiber Channel and Ethernet – the odd couple.

And here are the other great articles I stumbled upon during the last few days:

Responsible Generation of BGP Default Route

Chris sent me the following question a while ago:

I've got a full Internet BGP table, and want to [responsibly]{.emphasis} send a default route to a downstream AS. It's the "responsibly" part that's got me frustrated: How can I judge whether the internet is working and make the origination of the default conditional on that?

He’d already figured out the neighbor default-originate route-map command, but wanted to check for more generic conditions than the presence of one or more prefixes in the IP routing table.

Changing configuration with EEM – yes or no?

Daniel left a very relevant comment to my convoluted BGP session shutdown solution:

What I am currently doing is using EEM to watch my tracked objects and then issuing a neighbor shutdown command. Is there a functional reason I would not want to do it that way, and use the method you prescribe?

As always, the answer is “it depends.” In this case, the question to ask yourself is: “do I track configuration changes and react to them?”

OSPF-over-DMVPN Using Two Hub Routers

One of my readers sent me the following question a few days ago:

Do you have a webinar that covers Dual DMVPN HUB deployment using OSPF? If so, which webinar covers it?

I told him that the DMVPN: From Basics to Scalable Networks webinar covers exactly that scenario (and numerous others), describing both Phase 1 DMVPN and Phase 2 DMVPN design and implementation guidelines. Interestingly, he replied that the information on this topic seems to be very scant:

You Don’t Need OpenFlow to Solve Every Age-Old Problem

I read two great blog posts on Sunday: evergreen Fallacies of Distributed Computing from Bob Plankers and forward-looking Understanding Hadoop Clusters and the Network from Brad Hedlund. Read them both before continuing (they are both great reads) and try to figure out why I’m mentioning them in the same sentence (no, it’s not the fact that Hadoop uses distributed computing).

Long-distance IRF Fabric: Works Best in PowerPoint

HP has commissioned an IRF network test that came to absolutely astonishing conclusions: vMotion runs almost twice as fast across two links bundled in a port channel than across a single link (with the other one being blocked by STP). The test report contains one other gem, this one a result of incredible creativity of HP marketing:

For disaster recovery, switches within an IRF domain can be deployed across multiple data centers. According to HP, a single IRF domain can link switches up to 70 kilometers (43.5 miles) apart.

You know my opinions about stretched cluster… and the more down-to-earth part of HP Networking (the people writing the documentation) agrees with me.

Shut Down BGP Session Based on Tracked Object

In responses to my The Road to Complex Designs is Paved With Great Recipes post Daniel suggested shutting down EBGP session if your BGP router cannot reach the DMZ firewall and Cristoph guessed that it might be done without changing the router configuration with the neighbor fall-over route-map BGP configuration command. He was sort-of right, but the solution is slightly more convoluted than he imagined.

IPv6 MPLS/VPN (6VPE) with PPPoE and RADIUS

During my visit to South Africa someone told me that he got 6VPE working over an L2TP connection ... and that you should “use the other VRF attribute, not lcp:interface-config” to make it work. A few days ago one of the readers asked me the same question and although I was able to find several relevant documents, I wanted to see it working in my lab.

Large-scale bridging = nuked earth

If you’re not working for a data center fabric vendor (in which case please read the other today’s post), you’ll probably enjoy the excellent analogy Ethan Banks made after reading my TRILL-over-WAN post:

Think of a network topology like a road map. There's boulevards, major junction points, highways, dead ends, etc. Now imagine what that map looks like after it's been nuked from orbit: flat. Sure, we blew up the world, but you can go in a straight line anywhere you want.

... and don’t forget to be nice to the people asking for inter-DC VM mobility ;)

add comment

Monday, September 5, 2011 06:53 CEST

Nexus 1000V LACP offload and the dangers of in-band control

2021-03-01: Nexus 1000v turned into abandonware long time ago, and is now officially a zombie (oops, EOL). However, the challenges they were facing with LACP offload are still worth pointing out to anyone advocating centralized control plane (stupidity formerly known as SDN).

A while ago someone sent me the following comment as part of a lengthy discussion focusing on Nexus 1000V: “My SE tells me that the latest 1000V release has rewritten the LACP code so that it operates entirely within the VEM. VSM will be out of the picture for LACP negotiations. I guess there have been problems.”

If you’re not convinced you should be running LACP between the ESX hosts and the physical switches, read this one (and this one). Ready? Let’s go.

TRILL goes to WAN – the bridging craze continues

Remember how I foretold when TRILL first appeared that someone would be “brave” enough to reinvent WAN bridging and brouters that we so loved to hate in the early 90’s? The new wave of the WAN bridging craze has started: RFC 6361 defines TRILL over PPP (because bridging-over-PPP is just not good enough). Just because you can doesn’t mean you should.

see 4 comments

Thursday, September 1, 2011 06:06 CEST

VXLAN, OTV and LISP

Immediately after VXLAN was announced @ VMworld, the twittersphere erupted in speculations and questions, many of them focusing on how VXLAN relates to OTV and LISP, and why we might need a new encapsulation method.

VXLAN, OTV and LISP are point solutions targeting different markets. VXLAN is an IaaS infrastructure solution, OTV is an enterprise L2 DCI solution and LISP is ... whatever you want it to be.

Blog Posts in September 2011