Build the Next-Generation Data Center
6 week online course starting in spring 2017

Temper Your MacGyver Streak

Microseconds after VXLAN was launched at VMworld 2011, someone started promoting it as a data center extension solution. Even though layer-2 DCI doesn’t make much sense (even to server people) and VXLAN is really not a DCI solution, the lure of misusing a technology was irresistible.

Seeing marketing people promoting ideas that work best in PowerPoint is funny (particularly when they start defending them); doing the same in production network is dangerous.

Do keep in mind that it’s impossible to predict every creative idea a networking engineer reading the white papers or browsing though the manuals might come up with. Your particular combination of features that were never supposed to be used together might work in a lab, but as TAC engineers (off-the-record) told me when I proposed using 20 EIGRP processes on a single router over 100+ DLCIs – it will probably work, but do remember that you’ll be the only one in the world doing that, so you’ll hit all the bugs we never discovered because no one has done it before, and every time you open a TAC case you’ll have a lengthy dialogue with the escalation engineer (because the first-line TAC support will just get confused by your arcane configurations, so you’ll have to escalate the case) explaining why you’re doing things the way you are. On the other hand, there might be other solutions you could use that would be more in line with the usual use cases seen in the field, or the vendors’ documentation or validated designs.

In my case, I had to go with 20 EIGRP processes because the customer needed MPLS/VPN-like solution years before MPLS/VPN was invented (and yes, we migrated them to MPLS/VPN in the meantime). The customer had a well-coordinated IP address space, so I might have used an alternative design using BGP communities and selective route advertisements, but they insisted on using Frame Relay instead of a routed network (where we’d use the core routers as PE-routers), leaving the EIGRP mess as the only viable option.

In a similar vein – being young and reckless I started promoting a hierarchy of BGP route reflectors years before it became an accepted practice, and the only feedback I got when I asked whether such a design would work (apart from “you know nobody is using it so we never tested it”) was “looking at the code I couldn’t see why it wouldn’t work.” Highly reassuring if you’re building a large-scale mission-critical network.

Do keep in mind that if you want to be creative and live on the bleeding edge, you will get cut, and the network might collapse and die a horrible death. That might be fine if you’re working for a hotshot startup, or it might be necessary if you’re trying to repair an Ethernet network on the South Pole or deal with incompetent PHB who cannot get the budget for the gear you need (hint: run away as soon as possible), but in most cases you’re paid to deliver a stable and well-running network that supports the business of the company paying you, not to improve your resume and boost your ego with an implementation nobody dreamed of before. You’re not doing a great job if you cut too many corners, supposedly saving money while doing so, but at the same time exposing your company to unpredictable risks. Also, if you’re told to do the impossible (stretched firewalls come to mind), make sure you document the risks and make the business stakeholders (not the server or app team) aware of them.

Ah, and there’s the minor inconvenience of supportability – while it might feel good to be the only guy to understand the concoction you came up with, you might get tired of midnight calls or interrupted vacations when your unique solution fails in yet another unpredictably obscure way. Finally, do remember that you cannot be promoted till someone can replace you.

1 comment:

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.