Building Network Automation Solutions
6 week online course starting in September 2017

All the Best in 2016!

The number of visits to my web site is slowly going down – you’re giving me a very clear signal that it’s time to stop blogging.

I hope you’ll manage to catch at least a few quiet days with your loved ones and I wish you all the best in 2016!

More in 3 weeks or so ;)

Broadcom Tomahawk 101

Juniper recently launched their Tomahawk-based switch (QFX5200) and included a lot of information on the switching hardware in one of their public presentations (similar to what Cisco did with Nexus 9300), so I got a non-NDA glimpse into the latest Broadcom chipset.

You’ll get more information on QFX5200 as well as other Tomahawk-based switches in the Data Center Fabrics Update webinar in spring 2016.

Here’s what I understood the presentation said:

Leftover Training Budget? Let Me Help You

If you have some leftover training budget for 2015, there’s no better way to spend it than to invest it in a workgroup subscription ;)

You can choose between two standard packages (6 or 21 users) which include online consulting sessions, or create your own customized package.

Finally, if you plan to buy one of the standard packages, hurry up – the Dec15 promotional code gives you 10% discount till the end of the year.

Running Open Daylight in Production Network on Software Gone Wild

Nick Buraglio used OpenDaylight and OpenFlow-enabled switches to build a part of the exhibition network of a large international supercomputing conference and was kind enough to talk about his real-life experience in Episode 47 of Software Gone Wild.

We covered:

CPLANE Networks on Software Gone Wild

When I wrote a blog post explaining the difference between centralized control and centralized control plane, John Casey, CEO of CPLANE Networks wrote a comment sayingyeah, that’s exactly what we’re doing.

It took us a while to get the stars aligned, but finally we managed to sit down and chat about what they’re doing, resulting in Episode 46 of Software Gone Wild.

The Grumpy Old Network Architects and Facebook

Nuno wrote an interesting comment to my Stretched Firewalls across L3 DCI blog post:

You're an old school, disciplined networking leader that architects networks based on rock-solid, time-tested designs. But it seems that the prevailing fashion in network design and availability go against your traditional design principles: inter-site firewall clustering, inter-site vMotion, DCI, etc.

Not so fast, my young padawan.

Let’s define prevailing fashion first. You might define it as Kool-Aid id peddled by snake oil salesmen or cool network designs by people who know what they’re doing. If we stick with the first definition, you’re absolutely right.

Now let’s look at the second camp: how people who know what they’re doing build their network (Amazon VPC, Microsoft Azure or Bing, Google, Facebook, a number of other large-scale networks). You’ll find L3 down to ToR switch (or even virtual switch), and absolutely no inter-site vMotion or clustering – because they don’t want to bet their service, ads or likes on the whims of technology that was designed to emulate thick yellow cable.

Want to know how to design an application to work over a stable network? Watch my Designing Active-Active and Disaster Recovery Data Centers webinar.

This isn't the first time that readers have asked you about these technologies, and it won't be the last. Vendors will continue to market them despite their shortcomings, and customers will continue to eat them up.

As long as there will be someone willing to believe in fairy tales and Santa Claus, there will be someone dressed in red coat and fake beard yelling “Ho, Ho, Ho!”

Enterprise IT managers sometimes act like small kids. They don’t want to hear that they have people- and process problems, and love to believe that the next magical bit of technology will solve whatever it is that bothers them. Vendors obviously love to explore these cravings and sell them ever-more-complex solutions.

I'd like to think that vendors will also continue to work out the kinks and over time the technology will become rock solid and time-tested.

I am positive you can make any technology almost-rock-solid. You can also make pigs fly (see RFC 1925 sect. 2.3). However, have you included the fuel costs in your TCO?

Also, the more complex a technology is, the likelier it is to crash down like a house of cards, and you’ll be left with an incomprehensible mix of bits and pieces that will be impossible to put back together (see also: You can’t reformat your data center).

Nino concluded his comment with a question:

Are you too stuck on past, traditional designs and not being open to new ways of building IT? I get that IT is very cyclical, and these new trends may die in the future...or thrive, and the customers may either fail...or succeed.

I am very open to new ways of building IT. I preach the need for meaningful SDN (not the centralized control plane crap), network automation, and proper application architecture. I just refuse to believe in fairy tales, and solving non-technical problems with technology.


Looking for more red pills? Explore my SDN webinars, Designing Active/Active Data Centers webinar, and vMotion-related blog posts.

Featured Webinar: vSphere 6 Networking Deep Dive

The featured webinar of December 2015 is vSphere 6 Networking, a 6-hour deep dive into vSphere 6 networking features covering almost every single vSphere network-related feature.

What does this mean?

Trial subscribers get free access to select videos from this webinar (those marked with a yellow star in this listing) and can purchase it at significant discount.

Should We Use OpenFlow for Load Balancing?

Yesterday I described the theoretical limitations of using OpenFlow for load balancing purposes. Today let’s focus on the practical part and answer another question:

I wrote about the same topic years ago here and here. I know it’s hard to dig through old blog posts, so I collected them in a book.

Could We Use OpenFlow for Load Balancing?

It all started with a tweet Kristian Larsson sent me after I published my flow-based forwarding blog post:

Why Do We Need VXLAN (and What Is It)?

Do you need VXLAN in your data center or could you continue using traditional bridging? Do layer-2 fabrics make sense or are they a dead end in the evolution of virtual networking?

I tried to provide a few high-level answers in the Introduction to VXLAN video which starts the VXLAN Technical Deep Dive webinar. The public version of the video is now available on Free Content web site.

Sometimes It’s Not the Network

Marek Majkowski published an awesome real-life story on CloudFlare blog: users experienced occasional short-term sluggish performance and while everything pointed to a network problem, it turned out to be a garbage collection problem in Linux kernel.

Takeaway: It might not be the network's fault.

Also: How many people would be able to troubleshoot that problem and fix it? Technology is becoming way too complex, and I don’t think software-defined-whatever is the answer.

Is Flow-Based Forwarding Just Marketing Fluff?

When writing the Packet- and Flow-Based Forwarding blog post, I tried to find a good definition of flow-based forwarding (and I was not the only one being confused), and the one from Junos SRX documentation is as good as anything else I found, so let’s use it.

TL&DR: Flow-based forwarding is a valid technical concept. However, when mentioned together with OpenFlow, it’s mostly marketing fluff.

Detecting NAT64 Prefix

If you’re a host running on an IPv6-only network, you might want to detect the IPv6 prefix used for NAT64 (for example, to transform IPv4 literals a clueless idiot embedded into a URL into IPv6 addresses).

Apple has a wonderful developer-focused page describing NAT64 and DNS64, including the way they synthesize IPv6 addresses from IPv4 literals. You (RFC 6919) MUST read it.

Packet- and Flow-Based Forwarding

I don’t like to correct my friends in public, but if someone says “I still believe that flow-based technologies will exceed the capabilities of packet-based technologies” (see Network Break 53), it’s time to revisit the networking fundamentals.

What is a flow?

According to Wikipedia (but what do they know…):

Fibbing: OSPF-Based Traffic Engineering with Laurent Vanbever

You might be familiar with the idea of using BGP as an SDN tool that pushes forwarding entries into routing and forwarding tables of individual devices, allowing you to build hop-by-hop path across the network (more details in Packet Pushers podcast with Petr Lapukhov).

Researchers from University of Louvain, ETH Zürich and Princeton figured out how to use OSPF to get the same job done and called their approach Fibbing. For more details, listen to Episode 45 of Software Gone Wild podcast with Laurent Vanbever (one of the authors), visit the project web site, or download the source code.

Thank you for your trust!

Wow, another year swooshed by. I can’t believe it’s almost gone. Maybe it’s all the travels I had throughout the year, and I MUST start with a huge THANK YOU to whoever is watching after me – there wasn’t a single major SNAFU.

Next, I’d like to thank the people who caused all that travel: attendees of my workshops.

Ethernet Checksums Are Not Good Enough for Storage (Updated)

A while ago I described why some storage vendors require end-to-end layer-2 connectivity for iSCSI replication.

TL&DR version: they were too lazy to implement iSCSI checksums and rely on Ethernet checksums because TCP/IP checksums are not good enough.

It turns out even Ethernet checksums fail every now and then.

2015-12-06: I misunderstood the main technical argument in Evan’s post. The real problem is that switches recalculate CRC, so the Ethernet CRC is no longer end-to-end protection mechanism.

Blogging Rule#1: Be Useful

I love stumbling upon new networking-focused blogs. Many of my old friends switched to the dark side vendors and stopped blogging, others simply gave up, and it seems like there aren’t that many engineers that would like to start this experiment.

One of the obvious first questions is always “what should I write about” and my reply is always “it doesn’t really matter – make sure it’s useful.”

Can You Afford to Reformat Your Data Center?

I love listening to the Datanauts podcast (Ethan and Chris are fantastic hosts), starting from the very first episode (hyper-converged infrastructure) in which Chris made a very valid comment along the lines of “with the hyper-converged infrastructure it’s possible to get so many things done without knowing too much about any individual thing…” and I immediately thought “… and what happens when it fails?

Video: Cumulus Linux Architecture

Do you want to know more about Cumulus Linux after learning what data center architectures it supports, what base technologies it uses, and how you can use it to simplify network configurations? It’s time to explore Cumulus Linux architecture (part 5 of the presentation Dinesh Dutt had during the Data Center Fabrics webinar).

DMVPN Split Default Routing and Internet Access

One of the engineers listening to my DMVPN webinars sent me a follow-up question (yes, I always try to reply to them) asking how to implement direct Internet access from the spoke sites (aka local exit) in combination with split default routing you have to use in DMVPN Phase 2 or Phase 3 networks.

It’s really simple: either you have a design requirement that requires split default routing, or you don’t.

IP over Ultrasound

One of my readers read the Ars Technica article on ads communicating with other devices via ultrasound and wondered whether something similar could be done for IP.

Not surprisingly, someone already did it. A quick google search found this tutorial which explains how to run IP stack over Gnuradio (at speeds that were last experienced with dial-up modems 30 years ago).

Presentation: All You Need Are Two Switches

I was asked to present a data-center-related talk last week and decided to focus on one of my favorite topics: because most people don’t have more than a few hundred servers in their data center, they don’t need more than two switches (or a rack of servers).

Not surprisingly, an equipment reseller sitting in the room was not amused.

The video and the slide deck are already online, but there’s a minor challenge: the whole event was in Slovenian ;) However, I plan to record the same topic in English once my SDN travels stop.

Junos Fusion: the First Steps (updated)

I was really excited when Juniper announced Junos Fusion. I hoped for QFabric Done Right, but after watching the NFD10 video describing the architecture, I was disappointed: they reinvented Fabric Extenders.

The blog post was slightly updated on November 14th 2015 based on feedback received from Juniper engineers.

Test-Driven Network Development with Michael Kashin on Software Gone Wild

Imagine you’d design your network by documenting the desired traffic flow across the network under all failure conditions, and only then do a low-level design, create configurations, and deploy the network… while being able to use the desired traffic flows as a testing tool to verify that the network still behaves as expected, both in a test lab as well as in the live network.

Control-Plane Protocols for Overlay Virtual Networking – the Madness Continues

You might remember all the fuss about various encapsulations used in overlay virtual networking… just because one wouldn’t be good enough (according to Andrew Lerner “we provide users with choice” actually means “we can’t decide which product to offer you”).

Andrew Lerner on Vendorspeak

Andrew Lerner, my favorite Gartner analyst, recently published a hilarious blog post describing what vendors mean when they say “our product is software-defined” or “we’ll make it work”. Enjoy!

Need more vendorspeak? Try eight levels of vendor acceptance (carefully documented during a particularly stressful on-site test in Poland).

The Numerous Levels of SDN Reality

A newbie exploring the mythical lands of SDN might decide to start at the ONF definition of SDN, which currently (November 2015) starts with a battle cry:

The physical separation of the network control plane from the forwarding plane, and where a control plane controls several devices.

The rest of that same page is what I’d call the marketing definition of SDN: directly programmable, agile, centrally managed, programmatically configured, open standards based and vendor-neutral.

Video: Control Plane Protocols in OpenFlow-Based Networks

One of the typical questions I get in my SDN workshops is “how do you run control-plane protocols like LACP or OSPF in OpenFlow networks?”.

I wrote a blog post describing the process two years ago and we discussed the details of this challenge in the OpenFlow Deep Dive webinar. That part of the webinar is now public: you’ll find the OpenFlow Use Cases: Control-Plane Protocols video on the Free Content web site.

There’s a Problem with IPv6 Multihoming

In an amazing turn of events, at least one IETF working group recognized we have serious problems with IPv6 multihoming. According to the email Fred Baker sent to a number of relevant IETF working groups:

PI multihoming demonstrably works, but PA multihoming when the upstreams implement BCP 38 filtering requires the deployment of some form of egress routing - source/destination routing in which the traffic using a stated PA source prefix and directed to a remote destination is routed to the provider that allocated the prefix. The IETF currently has no such recommendation, or consensus that it should have.

Here are a few really old blog posts just in case you don’t know what I’m talking about (and make sure you read the comments as well):

1000 VM per Rack Is Perfectly Realistic

Last year I claimed that you don’t need more than two switches in your data center (I’ll run a presentation on the same topic in a few days), but focused exclusively on the networking side of the equation.

Iwan Rahabok recently published a great blog post describing the compute- and storage parts of it. His conclusion: 1000 VM per rack is perfectly realistic.

Learn to Speak Your Peer’s Language with Webinars

One of the reasons I started creating webinars was to help networking engineers grasp the basics of adjacent technologies like virtualization and storage. Based on feedback from an attendee of my Introduction to Virtual Networking webinar it works:

I am completely on the Network side of the house and understand what I need to build for Storage/Data replication, but I really never thoroughly understood why. This allowed me to have a coherent discussion with my counterparts in DB and Storage and some of the pitfalls that can occur if we try to cowboy the network design.

Recommendation: if you have a similar problem, start with Introduction to Virtual Networking and continue with Data Center 3.0 webinar.

Stretched Firewalls across Layer-3 DCI? Will the Madness Ever Stop?

I got this question from one of my readers (and based on these comments he’s not the only one facing this challenge):

I was wondering if you can do a blog post on Cisco's new ASA 5585-X clustering. My company recently purchased a few of these with the intent to run their cross data center active/active firewalls but found out we cannot do this without OTV or a layer 2 DCI.

A while ago I expressed my opinion about these ideas, but it seems some people still don’t get it. However, a picture is worth a thousand words, so maybe this will work:

Optimizing Traffic Engineering with NorthStar Controller on Software Gone Wild

Content providers were using centralized traffic flow optimization together with MPLS TE for at least 15 years (some of them immediately after Cisco launched the early MPLS-TE implementation in their 12.0(5)T release), but it was always hard to push the results into the network devices.

PCEP and BGP-LS all changed that – they give you a standard mechanism to extract network topology and install end-to-end paths across the network, as Julian Lucek of Juniper Networks explained in Episode 43 of Software Gone Wild.

Survey: Vendor NETCONF and REST API Support

Time for another fill-in-the-blanks survey: how many vendors support NETCONF and/or REST API in their data center switches, routers, firewalls and load balancers?

Please help me complete the tables by writing a comment – and do keep in mind that it only counts if it’s documented in a public configuration guide on vendor’s web site.

Also, I’m not aware of any vendor using standard NETMOD YANG models. If someone does, please let me know.

Is Anyone Using Long-Distance VM Mobility in Production?

I had fun times participating in a discussion focused on whether it makes sense to deploy OTV+LISP in a new data center deployment. Someone quickly pointed out the elephant in the room:

How many LISP VM mobility installs has anyone on this list been involved with or heard of being successfully deployed? How many VM mobility installs in general, where the VMs go at least 1,000 miles? I'm curious as to what the success rate for that stuff is.

I think we got one semi-qualifying response, so I made it even simpler ;)

Video: Simplify Network Configurations with Cumulus Linux

Many vendors talk about network automation these days, and almost all of them gloss over an important detail: automation works best when you manage to simplify things to the bare minimum needed to get the job done.

One of the vendors that focus on simplifying the network device configuration is Cumulus Linux.

Was CLNP Really Broken?

One of my readers sent me this question after listening to the podcast with Douglas Comer:

Professor Comer mentioned that IP choose a network attachment address model over an endpoint model because of scalability. He said if you did endpoint addressing it wouldn’t scale. I remember reading a bunch of your blog posts about CNLP (I hope I’m remembering the right acronym) and I believe you liked endpoint addressing better than network attachment point addressing.

As always, the answer is “it depends” (aka “we’re both right” ;).

Ever Heard of Role-Based Access Control?

During my recent SDN workshops I encountered several networking engineers who use Nexus 1000V in their data center environment, and some of them claimed their organization decided to do so to ensure the separation of responsibilities between networking and virtualization teams.

There are many good reasons one would use Nexus 1000V, but the one above is definitely not one of them.

Why Would You Want to Attend a Classroom Workshop?

One of my regular subscribers wondered whether it makes sense to attend a live workshop (like the one we’re running in Miami in a few weeks) instead of listening to my webinars:

I am following your blog posts quite regularly, I’ve been a yearly subscriber for more than 3 years now and I’m even trying to attend as many webinars as I can in real time. Is there a real benefit to participate in this classroom event if we are almost aware of all your slide decks and videos?

Absolutely. Here’s what one of the attendees of a recent SDN workshop wrote when asking me whether I would be willing to do an on-site event for his company:

We Need Product Documentation, not just Glitzy Demos

Whenever a vendor approaches me touting the benefits of their new gizmo, they want to give me a product demo, or offer me access to online labs… and I always tell them I’m not interested until I see their design and configuration guides.

Here’s why I think you should take the same approach:

More Features, Improved Lock-In

Found an interesting article on High Scalability blog (another must-read web site) on how PostgreSQL improves locking behavior in high-volume transaction environment.

Needless to say, the feature is totally proprietaryrather unique and not available in most other database products. Improved locking behavior ⇒ improved lock-in.

Moral of the story: Stop yammering. Networking is no different from any other field of IT.

Update: Yep, I goofed up on the proprietary bit (it was one of those “I don’t think this word means what you think it means” gotchas). However, if you think open source product can't have proprietary features or you can’t get locked into an open-source product, I congratulate you on your rosy perspective. Reality smudged mine years ago.

SDN Internet Router Is in Production on Software Gone Wild

You might remember the great idea David Barroso had last autumn – turn an Arista switch into an Internet edge router (SDN Internet Router – SIR). In the meantime, he implemented that solution in production environment serving high-speed links at multiple Internet exchange points. It was obviously time for another podcast on the same topic.

The Lack of Historic Knowledge Is so Frustrating

Every time I’m explaining the intricacies of new technologies to networking engineers, I try to use analogies with older well-known technologies, trying to make it simpler to grasp the architectural constraints of the shiny new stuff.

Unfortunately, most engineers younger than ~35 years have no idea what I’m talking about – all they know are Ethernet, IP and MPLS.

Just to give you an example – here’s a slide from my SDN workshop.

Get Digital Content with the SDN Workshop

Last week I ran two SDN workshops, and in both of them the participants were busy taking notes as I explained the intricacies of concepts like SDN, NFV and network automation, and tools like OpenFlow or BGP.

However, how often did you revisit notes taken at a presentation and kept wondering “what exactly was he trying to say?” … or felt like the training you attended was like drinking from a fire hose and you missed most of the good stuff?

You won’t have that problem during the Miami SDN/NFV/SDDC retreat.

Sometimes You Have to Decide How Badly You Want to Fail

Another week, another ExpertExpress session, as is often the case focusing on two data centers with stretched VLANs spanning both of them. However, this one was particularly irksome, as the customer ran a firewall cluster stretched across two locations.

I gave the customer engineers my usual recommendations:

DMVPN Split Default Routing

SD-WAN is all the rage these days (at least according to software-defined pundits), but networking engineers still build DMVPN networks, even though they are supposedly impossibly-hard-to-configure Rube Goldberg machinery.

To be honest, DMVPN is not the easiest technology Cisco ever developed, and there are plenty of gotchas, including the problem of default routing in Phase 2/3 DMVPN networks.

Winston Churchill on IPv6

While researching for another blog post, I stumbled upon this speech by Winston Churchill:

When the situation was manageable it was neglected, and now that it is thoroughly out of hand we apply too late the remedies which then might have effected a cure. There is nothing new in the story. It is as old as the Sibylline Books. It falls into that long, dismal catalogue of the fruitlessness of experience and the confirmed unteachability of mankind. Want of foresight, unwillingness to act when action would be simple and effective, lack of clear thinking, confusion of counsel until the emergency comes, until self-preservation strikes its jarring gong -these are the features which constitute the endless repetition of history.

Obviously mr. Churchill wasn't talking about IPv6 but about way more serious matters… but it's also obvious he was right abut the unteachability of mankind.

Enterprise Content-over-IPv6 Deployment Scenarios

After ARIN ran out of IPv4 address space (in a totally uncontrolled “let’s party till it’s over” way) US enterprise IT shops (RFC 6919) OUGHT TO learn how to spell IPv6 (US service providers are already ahead of the pack).

You may also decide to ignore IPv6 indefinitely, but do keep in mind that consultants love panicking clients.

Douglas Comer on the Future of Networking

Jim Small asked me what I thought about the Future of Networking Packet Pushers podcast with Douglas Comer. I decided to listen to it while driving toward one of my recent hikes, and it was a great decision– it was the best Packet Pushers podcast I listened to in a long while.

Get Subscription while Attending the Rome SDN/NFV Event

Reiss Romoli, the fantastic organizers of my SDN/NFV event in Rome, Italy in late October are offering you a free personal subscription – a saving of $299 or approximately EUR 270.

All you have to do to qualify is (A) download and fill in the registration form, (B) send it to Reiss Romoli and (C) pay before attending the webinar.

Yeah, I know the PDF form says “fax it back” – everyone has to use the tools that work best in their environment.

Hope we'll meet in warm and sunny Rome in a few weeks!

Building Carrier-Grade Cloud Infrastructure

During my recent SDN workshop one of the attendees asked me “How do you build carrier-grade (5 nines) cloud infrastructure with VMware NSX?

Short answer: You don’t… and it’s a wrong question anyway.

Software-Defined IXP with Laurent Vanbever on Software Gone Wild

A while ago I started discussing the intricate technical details of fibbing (an ingenious way of implementing traffic engineering with traditional OSPF) with Laurent Vanbever and other members of his group, and we decided to record a podcast on this topic.

Things never go as planned in a live chat, and we finished talking about another one of his projects – software defined Internet exchange point (SDX), the topic of Episode 41 of Software Gone Wild.

Designing Active-Active and Disaster Recovery Data Centers

A year ago I was a firm believer in the unlimited powers of Software-Defined Data Centers and their ability to simplify workload migrations. After all, if you can use an API to create any data center object, what’s stopping you from moving the workload running in a data center to another location.

As always, there’s a huge difference between theory and reality.

What Happens When a Data Center Fabric Switch Fails?

I got into an interesting discussion with a fellow networking engineer trying to understand the impact of a switch failure in a L2/L3 data center fabric (anything from Avaya’s fabric or Brocade’s VCS Fabric to Cisco’s FabricPath, ACI or Juniper’s QFabric) on MAC and ARP tables.

Here’s my take on the problem – have I missed anything?

Response: Firefighters and Fire Marshals

In a recent blog post Tom Hollingsworth made a great point: we should refocus from fighting one fire at a time to preventing fires.

I completely agree with him. However…

Learn SDN with Virtual Routers and Switches

Bryan would love to get hands-on SDN experience and sent me this question:

I was recently playing around with Arista vEOS to learn some Arista CLI as well as how it operates with an SDN controller. I was wondering if you know of other free products that are available to help people learn.

Let’s try to do another what-is-out-there survey.

Cumulus Linux Base Technologies

Dinesh Dutt started his part of the Data Center Fabrics Update webinar with “what is Cumulus Linux all about” and “what data center architectures does it support” and then quickly jumped into details about the base technologies used by Cumulus Linux: MLAG and IP routing.

Not surprisingly, the MLAG part generated tons of questions, and Dinesh answered all of them, even when he had to say “we don’t do that”.

DHCP Details You Didn’t Know

If you’ve been a networking engineer (or a sysadmin) for a few years, you must be pretty familiar with DHCP and might think you know everything there is to know about this venerable protocol. So did I… until I read the article by Chris Marget in which he answers two interesting questions:

  • How does the DHCP server (or relay) send DHCP offer to the client that doesn’t have an IP address (and doesn’t respond to ARP)?
  • How does the DHCP client receive the DHCP responses if it doesn’t have an IP address?

VSAN: As Always, Latency Is the Real Killer

When I wrote my stretched VSAN post, I thought VSAN uses asynchronous replication across WAN. Duncan Epping quickly pointed out that it uses synchronous replication, and I fixed the blog post.

The “What about latency?” question immediately arose somewhere in my subconscious, but before I could add that thought to the blog post (because travel), Anders Henke wrote a lengthy comment that totally captured what I was thinking, so I’m including it in its entirety:

Renewing Subscription before It Expires

One of my subscribers asked me: “My subscription is valid till early December. How could I renew it now (due to budgetary reasons)?

While I already had the process to do just that, there was no link that one could use (you had to know the correct URL). I’ve fixed that – you’ll find the renewal link on the first page of

Response: SDN is eating vendors’ lunch

Another week, another story from the SDN land, this time The Register reporting on AT&T plans. Even though there are almost no details in the story, the headline boasts that “SDN is eating vendors’ lunch”, prompting SDN hopefuls on LinkedIn groups to claim that “the promise of SDN is fast coming to fruition.”

Not so fast.

DLSP – QoS-Aware Routing Protocol on Software Gone Wild

When I asked “Are there any truly QoS-aware routing protocols out there?” in one of my SD-WAN posts, Marcelo Spohn from ADARA Networks quickly pointed out that they have one – Dynamic Link-State Routing Protocol.

He also claimed that DLSP has no scalability concerns – more than enough reasons to schedule an online chat, resulting in Episode 40 of Software Gone Wild. We didn’t go too deep this time, but you should get a nice overview of what DLSP is and how it works.

VMware VSAN Can Stretch – Should It?

Pirmin Sidler read the stretched VSAN blog posts by Duncan Epping (intro, HA/DRS considerations, demo) and asked me what I think about stretched VSAN considering my opinions on long-distance vMotion.

TL&DR answer: it makes way more sense than long-distance vMotion. However…

Why It's Hard to Deploy SDN-Like Functionality Today

Whenever I talk about the various definitions of SDN (ending with the “SDN provides an abstraction layer”), old-timers sitting quickly realize that the SDN products that you can deploy in real life aren’t that different from what we did in the past – an SDN controller is often just an overhyped glorified network services orchestration system.

OK, so why didn’t we have that same functionality for the last 20 years?

The Autumn Cloud/SDN Roadtrip

One of my kids recently asked me whether I plan to travel somewhere during the autumn. The answer was “a bit” surprising: Boston (just got back), Zurich, Bern, Stockholm, Ljubljana, Heidelberg, Nuremberg, Rome, Miami, Ljubljana, Helsinki, and maybe Munich and/or another trip to Zurich… so I might not be able to blog as frequently as usual.

Most of those trips are public events (hyperlinked). If you’re anywhere close one of those cities, check them out and drop by.

VXLAN Hardware Gateway Overview

One of my readers stumbled upon a 4-year-old blog post explaining the potential implementations of VXLAN hardware gateways, and asked me if that information is still relevant.

I knew that I’d included tons of information in the Data Center Fabrics and VXLAN Deep Dive webinars, but couldn’t find anything on the web, so let’s fix that.

Basics of IPv6 Addressing

Another Friday, another short IPv6 video (didn’t have time to create anything more substantial this week). This one describes the basics of IPv6 addressing – I know most of you don’t need it, but do forward the link to friends who are still struggling with IPv6 basics.

Lego Bricks and Network Operating Systems

One of the comments I got on my Lego Bricks & BFT blog post was “well, how small should those modular Lego bricks be?

The only correct answer is “It should be Lego bricks all the way down” or (more formally) “Modularity is a concept that should be applied at every level of the architecture.

Today let’s focus on how much easier the life would be if we could take apart the network operating systems instead of just watching them as glued-together Death Stars.

Blessed by Gartner: Stretched VLANs Make Little Sense

One of my readers recently pointed me to a blog post written by Andrew Lerner from Gartner describing the drawbacks of stretched VLANs.

TL&DR: He’s saying more-or-less the same things I’ve been preaching for years. Now I can put Blessed by Gartner logo on my blog posts ;), and you can use the report to sway your CIO.

See You in Bern on September 16th

TL;DR: Gabi Gerber from Data Center Interest Group Switzerland (DIGS) is organizing a day-long Data Center event on September 16th, and invited me (again) as the keynote speaker. Do drop by to discuss data center design and automation challenges.

Use nProbe and ELK Stack to Build a Netflow Solution on Software Gone Wild

How do you capture all the flows entering or exiting a data center if your core Nexus 7000 switch cannot do it in hardware? You take an x86 server, load nProbe on it, and connect the nProbe to an analysis system built with ELK stack… at least that’s what Clay Curtis did (and documented in a blog post).

Obviously I wanted to know more about his solution and invited him to the Software Gone Wild podcast. In Episode 39 we discussed:

How Complex Is Your Data Center?

Sometimes it seems like the networking vendors try to (A) create solutions in search of problems, (B) boil the ocean, (C) solve the scalability problems of Google or Amazon instead of focusing on real-life scenarios or (D) all of the above.

Bryan Stiekes from HP decided to do a step in the right direction: let’s ask the customers how complex their data centers really are. He created a data center complexity survey and promised to share the results with me (and you), so please do spend a few minutes of your time filling it in. Thank you!

Private and Public Clouds, and the Mistakes You Can Make

A few days ago I had a nice chat with Christoph Jaggi about private and public clouds, and the mistakes you can make when building a private cloud – the topics we’ll be discussing in the Designing Infrastructure for Private Clouds workshop @ Data Center Day in Berne in mid-September.

The German version of our talk has been published on Inside-IT; those of you not fluent in German will find the English version below.

Path MTU Discovery Doesn’t Work with IP Multicast

A friend of mine sent me an interesting problem:

I noticed recently that my IOS routers aren't sending ICMP (unreachable; frag needed) messages in response to too-big IPv4 multicast packets with DF-bit set. They're just dropping these packets silently, breaking PMTUD.

Unfortunately, that’s not a bug but a FAD (Functions-as-Designed).

This Is How You Handle Customer Support Issues

My first ride with Uber was a love at first sight – the amount of friction they managed to remove from using-a-taxi process is unbelievable.

However, every love story eventually faces real-life issues, and what really matters is how you handle them at that point.

Don’t Optimize the Last 5%

Robin Harris described an interesting problem in his latest blog post: while you can reduce the storage access time from milliseconds to microseconds, the whole software stack riding on top still takes over 100 milliseconds to respond. Sometimes we’re optimizing the wrong part of the stack.

Any resemblance to SDN in enterprises or the magical cost-reduction properties of multi-vendor data center fabrics is obviously purely coincidental.

Must Read: James Mickens on Security

A link on Bruce Schneier’s blog pointed me to the latest article by the truly awesome James Mickens, this time making great fun of security researchers. Exactly what you need with your coffee on a Saturday morning. Enjoy!

Cumulus Linux Data Center Architectures

After introducing the concepts of Cumulus Linux in the Data Center Fabrics update session, Dinesh Dutt described the typical data center architectures implemented with Cumulus Linux and the lessons everyone should learn from large-scale web properties.

Musing on Nerd Knobs

Henk left a wonderful comment on my SDN will not solve real-life enterprise problems blog post. He started with a bit of sarcasm:

SDN will give more control and flexibility over the network to the customer/user/network-admin. They will be able to program their equipment themselves, they will be able to tweak routing algorithms in the central controller. They get APIs to hook into the heart of the intelligence. They get more config-knobs. It's gonna be awesome.

However, he thinks (and I agree) that this vision doesn’t make sense:

SDN: ONF Is Moving to “Logically Centralized Control Plane”

Open Networking Foundation has this nice and crisp definition of SDN:

[SDN is] The physical separation of the network control plane from the forwarding plane, and where a control plane controls several devices.

Using this definition it was easy to figure out whether certain architecture complies with ONF definition of SDN. It was also easy to point out why it was ridiculous.

SSL Termination on Virtual Appliances: Another Myth Busted

In the Can Virtual Routers Compete with Physical Hardware blog post I mentioned that SSL termination remains one of the few bastions of hardware acceleration.

Based on the comment made by RPM, it looks like I was wrong.

Here’s his reasoning:

How Long Will that Webinar take?

One of my readers wondered how long my NFV webinar is supposed to take (and I forgot to add that information to my web site), so he sent me this question: “How long is this webinar? An hour? Two hours? If it says "webinar" does that imply a 60 minute duration, so I shouldn't ask?

Short answer: live webinar sessions usually take between 90 minutes and 2 hours depending on the breadth of the topic, however…

Layer-3-Only Data Center Networks with Cumulus Linux on Software Gone Wild

With the advent of layer-3 leaf-and-spine data center fabrics, it became (almost) possible to build pure layer-3-only data center networks… if only the networking vendors would do the very last step and make every server-to-ToR interface a layer-3 interface. Cumulus decided to do just that.

How Did You Learn So Much About Networking?

One of my readers sent me a heartfelt email that teleported me 35 years down the memory lane. He wrote:

I only recently stumbled upon your blog and, well, it hurt. It's incredible the amount of topics you are able to talk about extensively and how you can dissect and find interesting stuff in even the most basic concepts.
May I humble ask how on earth can you know all of the things you know, with such attention to detail? Have you been gifted with an excellent memory, magical diet, or is it just magic?

Short answer: hard work and compound interest.

The Biggest Problem of SDN

A few weeks ago I decided to join the SDN group on LinkedIn and quickly discovered the biggest problem of SDN – many people, who try to authoritatively talk about it, have no idea what they’re talking about. Here’s a gem (coming from a “network architect”) I found in one of the discussions:

The SDN local controller can punt across to remote datacenters using not only IP, but even UDP over MPLS

Do I have to explain how misguided that statement is?

A Burst of SDN Webinars (and a Slew of SDN Workshops)

You might have noticed that I’m running three SDN-related webinars in the next three weeks, which is the highest density of live webinar sessions I ever had. What’s going on?

Before moving on, I’d like to point out that the early bird pricing for our November SDN/SDDC retreat in Miami, Florida, ends on September 1st, and there are only a few tickets left. Time to register ;)

Does Your Favorite Startup Support IPv6?

Whenever you talk to a new startup evaluating whether you’d consider including their products in your network, don’t forget to ask them a fundamental question: “does your product support IPv6?

If they reply “Nobody has ever asked for it”, it’s time to turn around and run away.

Enterprise IPv6 Adoption Explained

The OSCON keynote by Simon Wardley explains a number of things, including (lack of) enterprise IPv6 adoption. Enjoy!

Video: What Is Cumulus Linux All About?

In May 2015 I invited Dinesh Dutt to talk about Cumulus Linux and its typical use cases on an update session of the Data Center Fabrics Architecture webinar.

As expected, he started with the big picture: what is Cumulus Networks and Cumulus Linux all about?

SDN Will Not Solve Real-Life Enterprise Problems

It’s hard to visit an IT journal web site without stumbling upon an SDN fairy tale. Here’s another one:

The idea is to cut away the manual process of setting up new firewalls, load balancers and other network appliances, and instead open the door to provisioning a new network infrastructure within a few minutes.

And why exactly is it that you can’t do that today?

IPv6 and the Swinging Technology Pendulum

35 years ago, mainframes, single-protocol networks (be it SNA or DECnet), and centralized architectures that would make hard-core SDN evangelists gloat with unbridled pride were all the rage. If you’re old enough to remember IBM SNA, you know what I’m talking about.

A few years later, everything changed.

Published: Designing Scalable Web Applications (Part 2)

I published the second part of my Designing Scalable Web Applications course on my free content web site.

These presentations focus more on the application-level technologies (client- and server side), but I’m positive you’ll find some useful content in the caching and scale-out applications with load balancing sections.

SDN, SD-WAN and FCoE on Gartner Networking Hype Cycle

Gartner has updated their networking hype cycle. Not surprisingly:

Gartner won’t give you free access to the graph, but you’ll find it in an article published on The Register.

Can Virtual Routers Compete with Physical Hardware?

One of the participants of the Carrier Ethernet LinkedIn group asked a great question:

When we install a virtual-router of any vendor over an ordinary sever (having general-purpose microprocessor), can it really compete with a physical-router having ASICs, Network Processors…?

Short answer: No … and here’s my longer answer (cross-posted to my blog because not all of my readers participate in that group).

Video: Overview of IPv6 First-Hop Security Challenges

Like all other webinars, the IPv6 Microsegmentation webinar starts with a brief description of the problem we’re trying to solve: the IPv6 first-hop security challenges.

For an overview of this problem, watch this free video from the IPv6 microsegmentation webinar, for more details, watch the IPv6 Security webinar.

Big Flowering Things and Lego Bricks

Matt Oswalt wrote a great blog post complaining about vendors launching ocean-boiling solutions instead of focused reusable components, and one of the comments his opinion generated was along the lines of “I thought one of the reasons people wanted SDN, is because they wanted to deal with The Network – think about The Network's Performance, Robustness and Services instead of dealing with 100s or 1000s of individual boxes.

The comment is obviously totally valid, so let me try to reiterate what Matt wrote using Lego bricks ;)

Published: New SDN and NFV Materials

Last week I published slide decks for Network Function Virtualization, BGP-Based SDN Solutions and SDN Use Cases webinars – they’re available to subscribers and attendees registered for individual webinars.

Content from all three webinars is part of my SDN workshop – if you’d like to hear a live explanation, register for one of them.

Video: What Is IPv6?

One of the topics I’m addressing in the Enterprise IPv6 101 webinar (after all, it’s an introductory-level webinar) is the question of “what exactly is IPv6”. After all the promises, myths, in the end it turns out all we got were bigger addresses (and ton of additional complexity).

Fat Fingers and Other Campfire Stories

A recent well-publicized network outage prompted someone to start collecting fat-finger horror stories, and dozens of networking engineers were quick to chime in. Enjoy!

Reliability of SD-WAN and Hybrid WAN Solutions

My Business Case for SD-WAN blog post received numerous comments pointing out the potential pitfalls of hybrid WAN, including reduced security, unreliable Internet services and denial-of-service attacks.

While all those comments are perfectly valid, I still think hybrid WAN (whether implemented with traditional technologies or SD-WAN products) makes perfect sense.

Explaining the Pervasive Kludgeitis

I found a great explanation for hodgepodge of kludges found in "organically grown" solutions (legacy precursors to SD-WAN come to mind):

In a long-lived project, components are being replaced. Nice reusable components are easy to replace and so they are. Ugly non-reusable components are pain to replace and each replacement means both a considerable risk and considerable cost. Thus, more often then not, they are not replaced. As the years go by, reusable components pass away and only the hairy ones remain. In the end the project turns into a monolithic cluster of ugly components melted one into another.

Note: You really should read the whole blog post.

Is Linux TCP/IP Stack Really That Slow?

Most people casually involved with virtual appliances and network function virtualization (NFV) believe that replacing Linux TCP/IP stack with user-mode packet forwarding (example: Intel’s DPDK) boosts performance from meager 1 Gbps to tens of gigabits (and thus makes hardware forwarding obsolete).

Having data points is always better than having opinions; today let’s look at Receiving 1 Mpps with Linux TCP/IP Stack blog post.

2015-07-18: The blog post was updated based on feedback by Kristian Larsson.

Must read: Big Flowering Thing

It's so nice to see someone saying eloquently what you've been trying to say for a while – you (RFC 2119) MUST read what my friend Matt Oswalt wrote about Big Flowering Things.

Explaining the Mysteries of WiFi and Internet ;)

“Daddy, why is Internet not working even though I have good signal?”
“You really want to know?”
“OK, let me draw a diagram or two ;)”

… and now my 8-year old knows how DHCP and DNS works (root cause was a broken DNS proxy running on upstream $0.99 WAN router).

Can You Avoid Networking Software Bugs?

One of my readers sent me an interesting reliability design question. It all started with a catastrophic WAN failure:

Once a particular volume of encrypted traffic was reached the data center WAN edge router crashed, and then the backup router took over, which also crashed. The traffic then failed over to the second DC, and you can guess what happened then...

Obviously they’re now trying to redesign the network to avoid such failures.

Interested in Mesh Wireless Networks?

Engineers developing open-source wireless mesh network protocols and solutions get together every now and then to test the performance of competing mesh network ideas.

The next conference is organized in August 2015 in Maribor, Slovenia, so if you ever needed a good excuse to drop by Slovenia, now you have one ;)

Why Should I Care About Networking?

A month ago I was asked to deliver a short presentation on “something interesting about networking” at my local university. The temptation to talk about network automation and SDN was huge, but I quickly figured out that would make no sense (the audience were students in their freshman year) and decided to talk about a fundamental question: why should a programmer care about networking.

Unfortunately the presentation wasn’t recorded, but you can browse the slide deck on the public content web site.

Some Ridiculous SD-WAN Claims

SDx Central is usually a pretty good web site that I love to read, but even they occasionally manage to publish a gem like this one:

The problem with MPLS and similar technologies is that they weren’t designed with today’s business challenges in mind. Today, a company may need to launch an overseas R&D office overnight, or it may acquire a startup and want to immediately network with offices in distant regions and countries. Older technologies just don’t have the flexibility to do this on the fly.

Not surprisingly, the above paragraph triggered a severe case of Deja-Moo.

Must Read: IPv6 at Swisscom

While some people lament the lack of IPv6 business case, others are busy rolling it out – you (RFC 2119) SHOULD check out the Status of Swisscom’s IPv6 Activities presentation from recent Swiss IPv6 summit.

Routing Protocols and SD-WAN: Apples and Furbies

Ethan Banks recently wrote a nice blog post detailing the benefits and drawbacks of traditional routing protocols and comparing them with their SD-WAN counterparts.

While I agree with everything he wrote, the comparison between the two isn’t exactly fair – it’s a bit like trying to cut the cheese with a chainsaw and complaining about the resulting waste.

Video: ISP IPv6 Transition Strategies

The responses of Internet Service Providers (ISPs) to lack of IPv4 address space range from outright denial (sometimes coupled with reassuringly-expensive large-scale carrier-grade NAT) to all-in IPv6-only designs using 464XLAT for residual IPv4 connectivity.

To understand the implications of these extremes and a few data points between them, watch the ISP IPv6 Transition Strategies video from Enterprise IPv6 – the First Steps webinar.

Business Case for SD-WAN

An anonymous commenter wrote this comment to my initial SD-WAN post:

I can still hardly imagine the business case behind SD-WAN. Any thoughts?

This question is really easy to answer. There’s a huge business case that SD-WAN products are aiming to solve: replacing traditional MPLS/VPN networks with encrypted transport over public Internet. However…

Save the Date: Designing Infrastructure for Private Clouds Workshop in Switzerland

Gabi Gerber (the wonderful mastermind behind the Data Center Day event) is helping me bring my Designing Infrastructure for Private Clouds workshop (one of the best Interop 2015 workshops) to Switzerland.

This is the only cloud design workshop I’m running in Europe in 2015. If you’d like to attend it, this is your only chance – register NOW.

Project Calico: Is It Any Good?

At least a dozen engineers sent me emails or tweets mentioning Project Calico in the last few weeks – obviously the project is getting some real traction, so it was high time to look at what it’s all about.

TL&DR: Project Calico is yet another virtual networking implementation that’s a perfect fit for a particular use case, but falters when encountering the morass of edge cases.

LDP Label Allocation Revisited

One of my readers was having an LDP argument with his colleague:

Yesterday I was arguing with someone who works for a large MPLS provider about LDP label allocation. He kept saying that LDP assigns a label to each next-hop, not to each prefix. Reading your blog, I believe this is the default behavior on Juniper but on Cisco LDP assigns a unique label for each IGP (non-BGP) prefix.

He’s absolutely right; Cisco and Juniper use different rules when allocating MPLS labels.

Software-Defined Hardware Forwarding Pipeline on HP Switches

Writing OpenFlow controllers that interact with physical hardware is harder than most people think. Apart from developing a distributed system (which is hard in itself), you have to deal with limitations of hardware forwarding pipelines, differences in forwarding hardware, imprecise abstractions (most vendors still support single OpenFlow table per switch), and resulting bloated flow tables.

More on Centralized Control and SDN

After I wrote a comment on a LinkedIn discussion in the Carrier Ethernet group (more details here), Vishal Sharma wrote an interesting response, going into more details of distinction between centralized control and centralized control plane.

Webinars in 1H2015, and a Look Forward

The first half of 2015 was extremely productive – seven brand new webinars (or 22 hours of new content) were added to the webinar library.

Most of the development focus was on SDN and network automation: OpenFlow, NETCONF and YANG, Ansible, Jinja and YAML, and Monitoring SDN networks. There was also the traditional Data Center Fabrics Update session in May, IPv6 Microsegmentation webinar in March, and (finally!) vSphere 6 Networking Deep Dive in April.

Do I have to mention that you get all of them (and dozens of other webinars) with the subscription?

Software-Defined WAN:Well-Orchestrated Duct Tape?

One of the Software Defined Evangelists has declared 2015 as the Year of SD-WAN, and my podcast feeds are full of startups explaining how wonderful their product is compared to the mess made by legacy routers, so one has to wonder: is SD-WAN really something fundamentally new, or is it just another old idea in new clothes?

IPv6 Is Here, Get Used to It

Geoff Huston published an interesting number-crunching exercise in his latest IPv6-focused blog post: 8% of the value of the global Internet (GDP-adjusted number of eyeballs) is already on IPv6, and a third of the top-30 providers (which control 43% of the Internet value) have deployed large-scale IPv6.

The message is clear: The big players have moved on. Who cares about the long tail?

Just Out: Metro- and Carrier Ethernet Encryptors Market Overview

Christoph Jaggi has just published the third part of his Metro- and Carrier Ethernet Encryptor trilogy: the 2015 market overview. Public versions of all three documents are available for download on his web site:

Open-Source Network Engineer Toolbox

Elisa Jasinska, Bob McCouch and I were scheduled to record a NetOps podcast with a major vendor, but unfortunately their technical director cancelled at the last minute. Like good network engineers, we immediately found plan B and focused on Elisa’s specialty: open-source tools.

Another Spectacular Layer-2 Failure

Matjaž Straus started the SINOG 2 meeting I attended last week with a great story: during the RIPE70 meeting (just as I was flying home), Amsterdam Internet Exchange (AMS-IX) crashed.

Here’s how the AMS-IX failure impacted ATLAS probes (world-wide monitoring system run by RIPE) – no wonder, as RIPE uses AMS-IX for their connectivity.

SDN/OpenFlow/NFV Workshop: Frequent Questions

One of the potential attendees of my SDN workshop sent me a long list of questions. Almost every networking engineer, team leader or CIO asks the first one:

What will happen, if we don´t follow the SDN hype (in the short term, in the medium term and in the long term)?

Answering this question is the whole idea of the workshop.

The up-to-date list of scheduled SDN workshops is available on my web site.

Centralized Control Is Not Centralized Control Plane

Every other week I stumble upon a high-level SDN article that repeats the misleading SDN is centralized control plane mantra (often copied verbatim from the Wikipedia article on SDN, sometimes forgetting to quote the source).

Yesterday, I had enough and decided to respond.

Industry Thoughts in 30 seconds

A while ago someone working for an IT-focused media site approached me with a short list of high-level questions. Not sure when they’ll publish the answers, so here they are in case you might find them interesting:

What can enterprises do to ensure that their infrastructure is ready for next-gen networking technology implementations emerging in the next decade?

Next-generation networks will probably rely on existing architectures and forwarding mechanisms, while being significantly more uniform and heavily automated.

This Blog Post Wasn’t Properly Scheduled

A few days ago I stumbled upon an interesting blog post by my friend J Metz in my RSS feeds. As with all blog posts published on Cisco’s web site, all I got in the feed was a teaser (I know, I shouldn’t complain, I’m doing the same ;), but when I wanted to read more, I was greeted with a cryptic 404 (not even a fancy page full of images saying “we can’t find what you’re looking for).

NAPALM: Integrating Ansible with Network Devices on Software Gone Wild

What happens when network engineers with strong programming background and focus on open source tools have to implement network automation in a multi-vendor network? Instead of complaining or ranting about the stupidities of traditional networking vendors and CLI they write an abstraction layer that allows them to treat all their devices in the same way and immediately open-source it.

Should I Use a Traditional Firewall in Microsegmented Environment?

One of my readers wondered whether one still needs traditional firewalls in microsegmented environments like VMware NSX.

As always, it depends.

Do We Still Need Subnets in Virtualized Networks?

The proponents of microsegmentation are quick to explain how the per-VM-NIC traffic filtering functionality replaces the traditional role of subnets as security zones, often concluding that “you can deploy as many tenants as you wish in a flat network, and use VM NIC firewall to isolate them.

Published: Designing Scalable Web Applications

The first batch of the latest materials for my Designing Scalable Web Applications course have been published on my free content web site.

So You Need ISSU on Your ToR switch? Really?

During the Cumulus Linux presentation Dinesh Dutt had at Data Center Fabrics webinar, someone asked an unexpected question: “Do you have In-Service Software Upgrade (ISSU) on Cumulus Linux” and we both went like “What? Why?

Dinesh is an honest engineer and answered: “No, we don’t do it” with absolutely no hesitation, but we both kept wondering, “Why exactly would you want to do that?

Video: Scale-Out NAT

Network Address Translation (NAT) is one of those stateful services that’s almost impossible to scale out, because you have to distribute the state of the service (NAT mappings) across all potential ingress and egress points.

Midokura implemented distributed stateful services architecture in their Midonet product, but faced severe scalability challenges, which they claim to have solved with more intelligent state distribution.

Vertically Integrated Musings

Packet Pushers podcast is a constant source of inspiration for my blog posts. Recently I stumbled upon Rob Sherwood’s explanation of how they package Big Cloud Fabric:

It’s a vertically integrated solution, from Switch Light OS to our SDN controller and Big Cloud Fabric application.

Really? What happened to openness and disaggregation?

Video: Implementing VLAN-aware Bridge with OpenFlow

Reinventing the wheels makes little sense. Implementing old solutions with new tools might be in the same category, but at least it shows you the power and shortcomings of the new tools.

Building a VLAN-aware bridge in OpenFlow is thus a mandatory case study, and as you’ll see in the video from the OpenFlow Deep Dive webinar, it’s not as easy as it looks. For more details, watch the whole OpenFlow webinar (6 hours of in-depth videos), which you also get by buying Advanced SDN Training or subscription.

Turn Your Training or Presentation into a Story

If you’re a regular reader of this blog, you know I always prefer knowledge over recipes. Unfortunately, it’s pretty hard to build that knowledge using the widely available training materials, which often just blast you with a barrage of facts that you’re supposed to memorize and deliver at the certification exam.

How about turning your training into a South Park episode?

Case Study: Scale-Out Cloud Infrastructure

I helped several customers design scale-out private or public cloud infrastructure. In every case, I tried to start with a reasonably small pod (based on what they’d consider acceptable loss unit – another great term I inherited from Chris Young), connected them to a shared L3 backbone (either within a data center or across multiple data centers), and then tried to address the inevitable desire for stretched layer-2 connectivity.

You’ll find a summary of these designs in my next ExpressExpress case study: Scale-Out Private Cloud Infrastructure, and if you need more details, I’m usually available for online consulting.

Network Monitoring in SDN Era on Software Gone Wild

A while ago Chris Young sent me a few questions about network management in the brave new SDN world. I never focused on network management, but I know a few people who do, including Terry Slattery and Matt Oswalt. Interop brought us all together, and we sat down one evening after the presentations to chat about the challenges of monitoring and managing SDN networks.

We started with easy things like comparing monitoring results from virtual and physical switches (and why they’ll never match and do we even care), and quickly diverted into all sorts of potential oscillations caused by overly-dynamic load balancing caused by flow label-based ECMP and flowlets.

Don’t Be Overly Enthusiastic about Vendor Claims (This Time It's Brocade)

I was running the first part of the Data Center Fabrics Update webinar last week, mentioned that Brocade VDX 6740 supports Flex ports (a port you can use as Fibre Channel or 10GE port), and someone immediately wrote a comment saying “so does VDX 6940”. I was almost sure Flex ports aren’t available on VDX 6940 yet, and as always turned to vendor documentation to figure it out.

As expected, the data sheet is a bit vague, somewhat reflecting reality, but also veering into the realm of futures instead of features. Here’s what they say:

Open vSwitch Database Management Protocol (OVSDB) 101

Open vSwitch Database Management Protocol (OVSDB, RFC 7047) is often mentioned together with other semi-magic SDN tools that will bring everlasting peace to the chaotic world of networking. In reality, it’s just a database access/update protocol (think SQL with JSON encoding) with an interesting twist: a client can request notifications about table or row updates, replacing periodic database polling with a pub-sub solution.

Link Aggregation in OpenFlow Environment

One of my readers couldn’t figure out how to combine Link Aggregation Groups (LAG, aka Port Channel) with OpenFlow:

I believe that in LAG, every traditional switch would know how to forward the packet from its FIB. Now with OpenFlow, does the controller communicate with every single switch and populate their tables with one group ID for each switch? Or how does the controller figure out the information for multiple switches in the LAG?

As always, the answer is “it depends”, and this time we’re dealing with a pretty complex issue.

vSphere 6 Networking Deep Dive Webinar Is Complete

Last week we finished the last session of vSphere 6 Networking Deep Dive webinar6 hours of downloadable videos covering every single vSphere 6 networking topic are waiting for you.

As always, you get access to the webinar with your subscription, or you can buy just this webinar, or one of the bundles that include it: Data Center track or Data Center Trilogy.

Segment Routing 101 on Software Gone Wild

With all the hype around Segment Routing we said: “let’s chat about it, what could possibly go wrong”. The result: Episode 33 of Software Gone Wild. We didn’t get very far into the technical details, but you might still find the overview useful (or not – do tell me how good or useless it is).

Stupidities of Switch Programming (written in June 2013)

In June 2013 I wrote a rant that got stuck in my Evernote Blog Posts notebook for almost two years. Sadly, not much has changed since I wrote it, so I decided to publish it as-is.

In the meantime, the only vendor that’s working on making generic network deployments simpler seems to be Cumulus Networks (most other vendors went down the path of building proprietary fabrics, be it ACI, DFA, IRF, QFabric, Virtual Chassis or proprietary OpenFlow extensions).

Arista used to be in the same camp (I loved all the nifty little features they were rolling out to make ops simpler), but it seems they lost their mojo after the IPO.

Do We Need NAC and 802.1x?

Another question I got in my Inbox:

What is your opinion on NAC and 802.1x for wired networks? Is there a better way to solve user access control at layer 2? Or is this a poor man's way to avoid network segmentation and internal network firewalls.

Unless you can trust all users (fat chance) or run a network with no access control (unlikely, unless you’re a coffee shop), you need to authenticate the users anyway.

Scaling OpenStack Security Groups

Security groups (or Endpoint Groups if you’re a Cisco ACI fan) are a nice traffic policy abstraction: instead of dealing with subnets and ACLs, define groups of hosts and the rules of traffic control between them… and let the orchestration system deal with IP addresses and TCP/UDP port numbers.

On I-Shaped and T-Shaped Skills

Several of the conversations I had at the recent RIPE70 meeting were focused on career advice (usually along the lines of “which technology should I focus on next”) and inevitably we ended up discussing the benefits of T-shaped skills versus I-shaped skills… and I couldn’t resist drawing a few graphs illustrating them.

Build Your Development or Lab Environment with Ravello Systems

When preparing for my Simplifying Application Workload Migration workshop (coming in webinar format in autumn) I tried to find a solution that would allow me to recreate existing enterprise virtual network infrastructure in a cloud environment. Soon I stumbled upon Ravello Systems, remembered hearing about them on a podcast, and got in touch with them to figure out whether they could help me solve that challenge.

It turned you might use Ravello Systems’ solution to implement disaster recovery, but I got way more excited about the possibility to use their solution for labs or testing. To learn more about that, listen to Episode 32 of Software Gone Wild.

Reinventing CLNS with L3-only Forwarding

Hank left a lovely comment on my Rearchitecting L3-Only Networks blog post:

What you describe is literally intra-area routing in CLNS.

He’s absolutely right (and I admitted as much during my IPv6 Microsegmentation presentations @ Troopers 15).

Presentation & Video: Quo Vadis, SDN?

From the automation perspective, the RIPE conference is a dream come true – 30 seconds after you upload your presentation, it appears on the RIPE web site, it’s automatically updated on the podium computer, and the video recording of your talk is published before you even manage to get off the podium – so you can already watch my “SDN - 4 years later (aka Quo Vadis, SDN?)” presentation if you missed it yesterday.

Do We Need Network Programmability?

Jsicuran left this comment on my You Must Understand the Fundamentals to Be Successful blog post:

I just went through some Cisco webinar where they were showcasing the use of NX-OS API and Python to add a VLAN. I do some Python myself and have used that API for some simple DevOps-like uses, but for the most part if you are an enterprise and use Prime DCIM to add VLANs, why should you go through the coding process?

It obviously depends on where you are in your IT automation journey.

OpenStack Got Full IPv6 Support

Great news for everyone trying to deploy IPv6 in OpenStack: the Kilo release has full support for IPv6 in the tenant networks, including SLAAC, stateless and stateful DHCPv6. For more details, read an extensive blog post by Shannon McFarland.

OpenFlow in HP Campus Solutions on Software Gone Wild

When I finished my SDN workshop @ Interop Las Vegas (including a chapter on OpenFlow limitations), some attendees started wondering whether they should even consider OpenFlow in their SDN deployments. My answer: don’t blame the tool if people use it incorrectly.

Two days later, I discovered HP is one of those companies that knows how to use that tool.