Build the Next-Generation Data Center
6 week online course starting in spring 2017

Sooner or later someone will pay for the complexity of the kludges you use

I loved listening to OTV/FabricPath/LISP Packet Pushers podcast. Ron Fuller and Russ White did a great job explaining the role of OTV, FabricPath and LISP in a stretched (inter-DC) subnet deployment scenario and how the three pieces fit together … but I couldn't stop wondering whether there is a better method to solve the underlying business need than throwing three new pretty complex technologies and associated equipment (or VDC contexts or line cards) into the mix.

You probably already know the answer. There is a better option - use applications that use DNS and can survive external IP address change when they move from one DC to another. That might sound like an academic argument considering the current state of craplications in many enterprise environments, but do step back from the pressing networking problems and take a wider look from the business perspective.

Imagine two competitors both requiring multiple data centers for business continuity: business A, where the application developers do their own thing without considering the impact of their behavior on the IT infrastructure, and business B, where the applications are written so that they interoperate with the network (BTW, all you have to do in Windows environment is to deploy your services on recent Windows cluster software and you get DNS integration for free).

In the long run, business A will indubitably have higher IT costs - they will inevitable get locked into a single-vendor solution, because no two vendors support the same set of (somewhat) standard protocols and proprietary extensions you need. They will also need high-end gear (LISP or OTV tends to run on reassuringly expensive boxes) and pay dearly for the hardware and the licenses.

At the same time, business B will be able to build simple layer-3 networks using components from almost any vendor … because all they need to have a running network is the same minimal set of well-tested protocols that we're using in the global Internet.

I totally understand that networking vendors prefer to deal with business A. I also understand that some network architects and consultants prefer business A (they have just found a never-ending bonanza of ongoing challenges), but ask yourself: which approach makes more sense from the business perspective? Which one will result in lower IT costs and higher business agility (had to throw in the word adored by marketing departments)?

Think about this when considering the long-term networking strategy of your company … and good luck!


  1. Replies
    1. Not to mention the ever accumulating operational and security deficiencies of LISP (e.g., see and

  2. Agree completely... I thought their comments in the podcast about how 'DNS was fast enough now' for LISP was pretty telling too. Application resiliency and performance is the goal here. Why not go with what's already working in that space.

    Pay more than $.02 to run your DNS. Protect your DNS from DDoS. Control your own dynamic DNS with a GSLB product that gives granular control and allows you to group FQDNs together to encompass all the required services which make up an application. Include solid IP intelligence (like GeoIP) and reputation based scoring of LDNS or client resolvers. Now we have some BC knobs.

    Advanced GSLB is a time proven mechanism for Internet commerce. Turns out it works on the Intranet too. And..look ma.. no new switches needed.

  3. Your final paragraph hits the nail on the head though - everyone wants the latest toys and too few CxO level folks are technical enough to call them out. There are too many specialists and too few architects these days, where by my definition those would be the people with a sufficiently high altitude view to dismiss all the vendor marketing.

  4. One of the problems we have in the networking industry is just layering protocol on protocol, technology on technology, to solve a problem. It's like realizing your brakes are too hard to push, so you add a contraption that sits over the brake pedal to make it easier. Then you realize this device interferes with steering, so you add a steering extension to move the control to the other seat. Then you realize you can't roll the driver's side window down, so you add a long stick that allows you to manipulate the little switch.

    Each contraption you add, your friends ooh and ah over the beauty and complexity of the new idea. And all the while, your car is actually becoming undrivable. So you add a car management system...

    Okay, so I probably shouldn't be saying these things after just recording a show on LISP, but... There it is. I've said it!

    (An no, I'm not fond of LISP)

  5. Here here! Good post. Too many overlays/abstractions and tagging/tunneling..

  6. If you engineer your application in such a way that you have multiple machines in multiple data centers... you do not need migrating anything anywhere anymore. You shut down one and bring up another. DNS may or may not be a solution. It depends on the app.

    The tricky part is of course that designing with the distributed architecture in mind from the start is way harder than taking a single-box architecture, and bolting on the migration later on.

    It's a question if different trade-offs at different times.

    When/if the frameworks emerge that allow easy development of the distributed applications, and the programmers grow that will grok that (this is not too far away - 256-core CPUs pose very similar problems!), this problem will be solved in a cleaner way. Till then we need a few hacks.


    ps. Completely agree with Russ about layering.

    1. Completely agree. Unfortunately we were more than happy (and oh-so proud of ourselves) to provide too many hack for too long instead of forcing the programmers (and framework developers) to learn what needs to be done.

    2. [repasting with hopefully correct parent comment :-)]

      It's all a collection of tradeoffs. Much like time/space in programming, you have control/cost and many other axis. This is the "engineering" portion of our job - and I do not think that there is a single silver bullet.

      Today is the "interesting times" between the generations of different paradigms, where we'd like to already be in the new world - but the stuff that pays the bills today is all this cruft covered with warts, which creates a market for maintaining the relative health of the warts, which results in interesting interactions between the "new" and "old".

      And the speed with which the "new" becomes "old" keeps increasing.

      Think puppet is cool ? Wrong. Docker is the new black, apparently. And, interestingly, these "newest" concepts do add the "distributed" nature again. (cf: Welcome back to peer to peer world :-)

  7. Ivan, there's one point you could be missing.

    Applications could be extremely expensive and extremely crappy at the same time. There's not always an option to make them better, you just have to live with them, purchase expensive high-end servers instead of loads of cheap machines like Google does etc. These are mission critical applications. You can choose others from the competitors, but they would be the same crap from the admin's perspective.

    And so, one day you may realize that implementing OTV with a stretched inter-DC HA cluster at application level could be an option that could once save your business itself. Yes, I know you hate this, but if a meteorite hits one of the DCs and all data inside it is gone forever (that's probably unlikely, but it has the potential to be fatal for the company), the other DC would continue working almost as if nothing happened. And there could be no other options to replicate data sort-of realtime except for low-level SAN replication which is expensive, kludgy and also dangerous. Consistency is not guaranteed with such techniques.

    Issues with the stretched clustering itself (split brain caused by loss of the inter-DC link and so on) could be painful, but not fatal. They happen rarely. They could be tolerated - considering them as side effects of the ability to survive the destruction of any DC without having to roll back to nightly backups (which could also be fatal - imagine financial companies).

    1. Hi Dmitriy,

      If I understand it correctly, you're saying there's an application out there that

      A) Uses some **** that cannot be routed with today's gear (example: unicorns over LAT)
      B) Uses application-level data replication which is better than storage replication.

      I'm positive you can find something like that out there in the wild; if you can, do share what this monstrosity is.

      However, in most cases it all comes down to building your network based on information gleaned from vendors' whitepapers (because paying for a proper design and architecture obviously doesn't make sense). No wonder you get the network you deserve ;))

      Finally, I haven't heard of a data center being hit by a meteorite, but I do know of several organizations that experienced total meltdown of multiple data centers due to a bridging loop. Now they know better ...

      Kind regards,

    2. Hi Ivan,

      "B" is closer to reality. These days almost everything is IP, and even broadcasts are used just for ARPs, so no rainbows and unicorns.

      Those are mostly monstrous financial applications that already cost millions to support. They usually do support clustering, but it could be impossible to hold multiple clusters sharing some common resources, with failover.

      And storage replication is as low-level as it gets. You always say that problems shouldn't be moved a couple of layers down the stack. If you're unlucky, you could get corrupt file systems when doing storage replication at the event of an outage, because it works on block level, it doesn't even know file systems exist. But link loss during application level replication usually causes loss of only a negligible amount of data. It doesn't cause corruption.

      And as I already said, a bridging loop between DCs is painful (although less likely with modern kludges like OTV if implemented properly), it costs money, but if it happens rarely enough, it doesn't pose a threat to the business itself. A meteorite (fire and malfunctioning extinguishers, loaded truck crashing into the datacenter building, thermonuclear explosion, zombie ourbreak or anything else) does.

      The tradeoff is "more small incidents" vs "less small incidents, but risk of going out of business after unlikely, but still plausible disasters".

    3. ... and you're saying that these applications cannot work across L3 subnets?

    4. Imagine VRRP or something analogous. L2 multicast session replication (some fancy tech for those dinosaurs). SLB using ACE/F5/whatever wouldn't allow that session replication between nodes. And even if the nodes can communicate via routed L3: you still have a single cluster with nodes in different DCs, with the usual potential caveats like split brain. It just wouldn't force you to have all the nodes L2 adjacent (but you have to purchase an SLB device).

      You have to face it: a cluster with nodes spread across DCs, even with L2 DCI, may sometimes be the neatest and safest architecture if you're considering business survival. And yes, you should probably try to avoid such applications. Life is simpler without them. Most businesses don't use them, which allows them to avoid complexity and run routing everywhere.

    5. Now I guess we're close to being in agreement ;)

      BTW, there are other hacks out there that can create floating IP address for clustering needs without L2 interconnect.

    6. Which brings us to the initial point: those are the kludges you have to pay for. Sure you can even do proxy arp to simulate L2 connections (you lose some features such as L2 multicast, you can't forward packets with TTL 1). It's probably safer than interconnecting DCs with VPLS, but it's still an extremely dirty hack. Personally I would always prefer any encapsulation mechanisms carrying L2 traffic to proxy arp.


You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.