Category: design
Using BGP in Data Center Fabrics
While the large data centers increasingly use BGP as the routing protocol within their fabrics, the enterprise engineers tend to shy away from that idea because they think BGP is too complex/scary/hard-to-configure/obsolete/unknown/whatever.
It’s time to fix that.
The Grumpy Old Network Architects and Facebook
Nuno wrote an interesting comment to my Stretched Firewalls across L3 DCI blog post:
You're an old school, disciplined networking leader that architects networks based on rock-solid, time-tested designs. But it seems that the prevailing fashion in network design and availability go against your traditional design principles: inter-site firewall clustering, inter-site vMotion, DCI, etc.
Not so fast, my young padawan.
Let’s define prevailing fashion first. You might define it as Kool-Aid id peddled by snake oil salesmen or cool network designs by people who know what they’re doing. If we stick with the first definition, you’re absolutely right.
Now let’s look at the second camp: how people who know what they’re doing build their network (Amazon VPC, Microsoft Azure or Bing, Google, Facebook, a number of other large-scale networks). You’ll find L3 down to ToR switch (or even virtual switch), and absolutely no inter-site vMotion or clustering – because they don’t want to bet their service, ads or likes on the whims of technology that was designed to emulate thick yellow cable.
This isn't the first time that readers have asked you about these technologies, and it won't be the last. Vendors will continue to market them despite their shortcomings, and customers will continue to eat them up.
As long as there will be someone willing to believe in fairy tales and Santa Claus, there will be someone dressed in red coat and fake beard yelling “Ho, Ho, Ho!”
Enterprise IT managers sometimes act like small kids. They don’t want to hear that they have people- and process problems, and love to believe that the next magical bit of technology will solve whatever it is that bothers them. Vendors obviously love to explore these cravings and sell them ever-more-complex solutions.
I'd like to think that vendors will also continue to work out the kinks and over time the technology will become rock solid and time-tested.
I am positive you can make any technology almost-rock-solid. You can also make pigs fly (see RFC 1925 sect. 2.3). However, have you included the fuel costs in your TCO?
Also, the more complex a technology is, the likelier it is to crash down like a house of cards, and you’ll be left with an incomprehensible mix of bits and pieces that will be impossible to put back together (see also: You can’t reformat your data center).
Nino concluded his comment with a question:
Are you too stuck on past, traditional designs and not being open to new ways of building IT? I get that IT is very cyclical, and these new trends may die in the future...or thrive, and the customers may either fail...or succeed.
I am very open to new ways of building IT. I preach the need for meaningful SDN (not the centralized control plane crap), network automation, and proper application architecture. I just refuse to believe in fairy tales, and solving non-technical problems with technology.
Finally…
Looking for more red pills? Explore my SDN webinars, Designing Active/Active Data Centers webinar, and vMotion-related blog posts.
Presentation: All You Need Are Two Switches
I was asked to present a data-center-related talk last week and decided to focus on one of my favorite topics: because most people don’t have more than a few hundred servers in their data center, they don’t need more than two switches (or a rack of servers).
Not surprisingly, an equipment reseller sitting in the room was not amused.
The video and the slide deck are already online, but there’s a minor challenge: the whole event was in Slovenian ;) However, I plan to record the same topic in English once my SDN travels stop.
Building Carrier-Grade Cloud Infrastructure
During one of my SDN workshops, an attendees asked me “How do you build carrier-grade (5 nines) cloud infrastructure with VMware NSX?”
Short answer: You don’t… and it’s a wrong question anyway.
Designing Active-Active and Disaster Recovery Data Centers
A year ago I was a firm believer in the unlimited powers of Software-Defined Data Centers and their ability to simplify workload migrations. After all, if you can use an API to create any data center object, what’s stopping you from moving the workload running in a data center to another location.
As always, there’s a huge difference between theory and reality.
How Complex Is Your Data Center?
Sometimes it seems like the networking vendors try to (A) create solutions in search of problems, (B) boil the ocean, (C) solve the scalability problems of Google or Amazon instead of focusing on real-life scenarios or (D) all of the above.
Bryan Stiekes from HP decided to do a step in the right direction: let’s ask the customers how complex their data centers really are. He created a data center complexity survey and promised to share the results with me (and you), so please do spend a few minutes of your time filling it in. Thank you!
Private and Public Clouds, and the Mistakes You Can Make
A few days ago I had a nice chat with Christoph Jaggi about private and public clouds, and the mistakes you can make when building a private cloud – the topics we’ll be discussing in the Designing Infrastructure for Private Clouds workshop @ Data Center Day in Berne in mid-September.
The German version of our talk has been published on Inside-IT; those of you not fluent in German will find the English version below.
Cumulus Linux Data Center Architectures
After introducing the concepts of Cumulus Linux in the Data Center Fabrics update session, Dinesh Dutt described the typical data center architectures implemented with Cumulus Linux and the lessons everyone should learn from large-scale web properties.
Can You Avoid Networking Software Bugs?
One of my readers sent me an interesting reliability design question. It all started with a catastrophic WAN failure:
Once a particular volume of encrypted traffic was reached the data center WAN edge router crashed, and then the backup router took over, which also crashed. The traffic then failed over to the second DC, and you can guess what happened then...
Obviously they’re now trying to redesign the network to avoid such failures.
Save the Date: Designing Infrastructure for Private Clouds Workshop in Switzerland
Gabi Gerber (the wonderful mastermind behind the Data Center Day event) is helping me bring my Designing Infrastructure for Private Clouds workshop (one of the best Interop 2015 workshops) to Switzerland.
This is the only cloud design workshop I’m running in Europe in 2015. If you’d like to attend it, this is your only chance – register NOW.
So You Need ISSU on Your ToR switch? Really?
During the Cumulus Linux presentation Dinesh Dutt had at Data Center Fabrics webinar, someone asked an unexpected question: “Do you have In-Service Software Upgrade (ISSU) on Cumulus Linux” and we both went like “What? Why?”
Dinesh is an honest engineer and answered: “No, we don’t do it” with absolutely no hesitation, but we both kept wondering, “Why exactly would you want to do that?”
Case Study: Scale-Out Cloud Infrastructure
I helped several customers design scale-out private or public cloud infrastructure. In every case, I tried to start with a reasonably small pod (based on what they’d consider acceptable loss unit – another great term I inherited from Chris Young), connected them to a shared L3 backbone (either within a data center or across multiple data centers), and then tried to address the inevitable desire for stretched layer-2 connectivity.
You’ll find a summary of these designs in my next ExpressExpress case study: Scale-Out Private Cloud Infrastructure, and if you need more details, I’m usually available for online consulting.
How Do I Start My IPv6 Addressing Plan?
One of my readers was reading the Preparing an IPv6 Addressing Plan document on RIPE web site, and found that the document proposes two approaches to IPv6 addressing: encode location in high-order bits and subnet type in low-order bits (the traditional approach) or encode subnet type in high-order bits and location in low-order bits (totally counter intuitive to most networking engineers). His obvious question was: “Is anyone using type-first addressing in production network?”
Terastream project seems to be using service-first format; if you’re doing something similar, please leave a comment!
Design Challenge: Multiple Data Centers Connected with Slow Links
One of my readers sent me this question:
What is best practice to get a copy of the VM image from DC1 to DC2 for DR when you have subrate (155 Mbps in my case) Metro Ethernet services between DC1 and DC2?
The slow link between the data centers effectively rules out any ideas of live VM migration; to figure out what you should be doing, you have to focus on business needs.
Last Chapter of Data Center Design Case Studies Is Published
A few days ago I completed the last chapter in the Data Center Design Case Studies book: building disaster recovery and active-active data centers. It focuses on application behavior and business needs, not on the underlying technologies; the networking technology part tends to be way easier to solve than the oft-ignored application-level challenges.