Build the Next-Generation Data Center
6 week online course starting in spring 2017

Make Rip-and-Replace a bit less Creepy

It was a dark and stormy Halloween night and a networking engineer was stuck in a data center facing a Mission Impossible project: replace a failing Cat6500 with a brand-new Nexus 7000. Shouldn’t have been a problem, if only the cables were labeled.

Overlay-to-Underlay Network Interactions: Document Your Hidden Assumptions

If you listen to the marketing departments of overlay virtual networking vendors, it looks like the world is a simple place: you deploy their solution on top of any IP fabric, and it all works.

You’ll hear a totally different story from the physical hardware vendors: they’ll happily serve you a healthy portion of FUD, hoping you swallow it whole, and describe in gory details all the mishaps you might encounter on your virtualization quest.

The funny thing is they’re all right (not to mention the really fun part when FUDders change sides ;).

Bad Ideas and Abominations

This post SHOULD have been published on April 1st, but I need to define the terminology for another upcoming post, so here it is ;)

RFC 2119 defines polite words to use when something really shouldn’t be done. Some network designs I see deserve more colorful terminology.

2014-11-02: Updated with reference to RFC 6919 (/HT to @LapTop006)

New Webinar: Scaling Overlay Virtual Networks

You can get an overlay virtual networking solution from almost every major hypervisor- and data center networking vendor. Do you ever wonder which one to choose for your large-scale environment? I’m positive you’d get all of them up and running in a one-rack environment, but what if you happen to be larger than that?

We’ll try to address scalability hiccups and roadblocks you might encounter on your growth path in Scaling Overlay Virtual Networks webinar (get your free ticket here).

Cumulus Linux in Real Life on Software Gone Wild

A year ago Matthew Stone first heard about Cumulus Linux when I ranted about it on a Packet Pushers podcast (which only proves that any publicity is good publicity even though some people thought otherwise at that time), and when his cloud service provider company started selecting ToR switches he considered Cumulus together with Cisco and Arista… and chose Cumulus.

Tech Talks: Introduction to Label Distribution Protocol (LDP)

In the third part of MPLS Tech Talks we focused on the role of label distribution protocol (LDP) and its operation in frame-mode MPLS. You can watch the video on the ipSpace.net Tech Talks web page.

IPv6 in a Global Company – a Real-World Example

More than a year ago I wrote a response to a comment Pascal wrote on my Predicting the IPv6 BGP table size blog post. I recently rediscovered it and figured out that it’s (unfortunately) as relevant as it was almost 18 months ago.

Other people have realized we have this problem in the meantime, and are still being told to stop yammering because the problem is not real. Let’s see what happens in a few years.

All You Need Are Two Top-of-Rack Switches

Every time I’m running a classroom version of my Designing the Cloud Infrastructure workshop, I start with a simple question: “Who has more than 2000 VMs or bare-metal servers in the data center?

I might see three hands on a good day; 90-95% of the audience have smaller data centers… and some of them get disappointed when I tell them they don’t need more than two ToR switches in their data center.

Network Programmability Phase 1: the Configured Network

During his Network Programmability 101 webinar Matt Oswalt described three phases of network programmability. The first level in the pyramid of programmable awesomeness (his words, not mine) is described in today’s video.

Micro-BFD: BFD over LAG (Port Channel)

The discussion in the comments to my LAG versus ECMP post took a totally unexpected turn when someone mentioned BFD failure detection over port channels (link aggregation groups – LAGs).

What’s the big deal?

Just Published: Juniper Data Center Switches

Want to know what the difference between Virtual Chassis and Virtual Chassis Fabric is? How Local Link Bias works? How ISSU on QFX 5100 works even though the box doesn’t have two supervisor boards? You’ll find answers to all these questions in new videos describing Juniper data center switches.

Workload Mobility and Reality: Bandwidth Constraints

People talking about long-distance workload mobility and cloudbursting often forget the physical reality documented in the fallacies of distributed computing. Today we’ll focus on bandwidth, in a follow-up blog post we’ll deal with its ugly cousin latency.

TL&DR summary: If you plan to spread application components across the network without understanding their network requirements, you’ll get the results you deserve.

Border6 Non-Stop Internet: a Commercial BGP-Based SDN

Several SDN solutions that coexist with the traditional control- and data planes instead of ripping them out and replacing them with the new awesomesauce use BGP to modify the network’s forwarding behavior.

Border6 decided to turn that concept into a commercial product that we dissected in Episode 12 of Software Gone Wild podcast.

Enjoy the show (this time in video format).

Networking Is Not as Special as We Think It Is

I was listening to the Packet Pushers show #203 – an interesting high-level discussion of policies (if you happen to be interested in those things) – and unavoidably someone had to mention how the networking is all broken because different devices implement the same functionality in different ways and use different CLI/API syntax.

Last Call: Free Version of SDN and OpenFlow – The Hype and the Harsh Reality

If you want to get a free copy of my SDN and OpenFlow – The Hype and the Harsh Reality book, download it now. The offer will expire by October 20th.

Packet Reordering and Service Providers

My “Was it bufferbloat?” blog post generated an unexpected amount of responses, most of them focusing on a side note saying “it looks like there really are service providers out there that are clueless enough to reorder packets within a TCP session”. Let’s walk through them.

How to Get into the Top N%

Michael Church wrote an interesting answer on Quora, describing a logarithmic scale of programming skills and (even more importantly) hints to follow to get from n00b into the top N% (for some small value of N):

  • Budget 7–14 years;
  • Study voraciously;
  • Build things when you don’t know that you’ll succeed;
  • Network to get new ideas;
  • Job hop when you stop learning.

Replace “programmer” with “networking engineer” and read the whole answer ;)

IPv6 High Availability Strategies on NIL TV

I had a shorter version of my IPv6 High Availability talk @ Slovenian IPv6 summit this spring. The video is online, but wouldn’t be of much use to anyone but both Slovenian readers of this blog.

The English version of that same talk is now available on NIL TV (or you could decide to go for the full webinar or whole IPv6 track).

VXLAN and OTV: The Saga Continues

Randall Greer left a comment on my Revisited: Layer-2 DCI over VXLAN post saying:

Could you please elaborate on how VXLAN is a better option than OTV? As far as I can see, OTV doesn't suffer from the traffic tromboning you get from VXLAN. Sure you have to stretch your VLANs, but you're protected from bridging failures going over your DCI. OTV is also able to have multiple edge devices per site, so there's no single failure domain. It's even integrated with LISP to mitigate any sub-optimal traffic flows.

Before going through the individual points, let’s focus on the big picture: the failure domains.

Data Center Design Case Studies on Amazon – Take 2

In July I wrote about an Amazon Kindle version of my Data Center Design Case Studies book and complained about their royalties model. Someone quickly pointed out how to adapt to their system: split the book into multiple volumes and charge $9.99 for each.

It took me months to get there, but the first two volumes are finally on Amazon:

We Need Consistency more than Controllers

I was listening to the I2RS Packet Pushers podcast a while ago and was more than glad that when Greg Ferro yet again mentioned the complexity of OSPF, someone simply pointed out that controllers would not reduce the complexity; if anything they would increase it.

LAG versus ECMP

Bryan sent me an interesting question:

When you have the opportunity to use LAG or ECMP, what are some things you should consider?

He already gathered some ideas (thank you!) and I expanded his list and added a few comments.

Interop New York: It Was Great Fun

Last week’s Interop New York was hard work (three workshops in two days), but also lots of nerdy fun. I love doing workshops with smart participants who bring their real-life problems to the room and challenge my assumptions and conclusions, and I had plenty of these interactions during the week. Thank you all (you know who you are)!

Network Automation Tools with Jason Edelman on Sofware Gone Wild

The stars have finally aligned, and after months of scheduling Jason and myself found time to chat about network automation tools and all the other exciting things Jason is doing (and blogging about).

We started with easy topics:

Bufferbloat Killed my HTTP Session… or not?

Every now and then I get an email from a subscriber having video download problems. Most of the time the problem auto-magically disappears (and there’s no indication of packet loss or ridiculous latency in traceroute printout), but a few days ago Henry Moats managed to consistently reproduce the problem and sent me exactly what I needed: a pcap file.

TL&DR summary: you have to know a lot about application-level protocols, application servers and operating systems to troubleshoot networking problems.

Tech Talks: MPLS Traffic Engineering Basics

After covering the basics of MPLS, the discussion I had with Seamus Gilchrist turned to the basics of MPLS Traffic Engineering.

The video of that discussion is available online on the ipSpace.net Tech Talks web page.