Your browser failed to load CSS style sheets. Your browser or web proxy might not support elliptic-curve TLS

Building network automation solutions

6 week online course

reserve a seat

That’s it for 2011

2011 was a fantastic year for a networking geek, and you were awesome – helping me figure out the intricacies of new technologies, fixing my errors, and asking so many great questions that prompted me to dive deeper into the rabbit holes. I owe you a huge Thank you!

I hope you’ll be able to shut down your smartphones and pagers in the next few days and spend a few relaxing moments with your families … and I wish you great networking in 2012!


To keep the geeky spirit: snow angel as seen by Hubble
see 7 comments

Which virtual networking technology should I use?

After I published the Decouple virtual networking from the physical world article, @paulgear1 sent me a very valid tweet: “You seemed a little short on suggestions about the path forward. What should customers do right now?” Apart from the obvious “it depends”, these are the typical use cases (as I understand them today – please feel free to correct me).

read more see 5 comments

Is NAT a security feature?

15 years after NAT was invented, I’m still getting questions along the lines of “is NAT a security feature?” Short answer: NO!

Longer answer: NAT has some side effects that resemble security mechanisms commonly used at the network edge. That does NOT make it a security feature, more so as there are so many variants of NAT.

read more see 12 comments

Help me plan new webinars in 2012

I’m planning a series of shorter (~ 1 hour) update-type webinars in 2012. Some of them will cover new features and technologies that have been introduced since the time I last updated some of the most popular webinars (Data Center, VMware networking), others will focus on emerging technologies.

I would appreciate if you could help me plan them by taking a short survey, telling me which of the topics I identified are most important for you, and adding your favorite topics to the list. The survey won’t take more than a few minutes of your time.

Start the survey ...

see 2 comments

Interesting links (2011-12-18)

The coolest tool of the week: mxtommy/Cisco-SSH-Client. Thomas St. Pierre did a fantastic job - he modified SSH client to colorize the printouts generated by Cisco routers (similar to what VIM is doing to source code). Download his SSH client patches, recompile your SSH client, and enjoy!

And here are the other links accumulated in my Inbox, this time in somewhat more structured format ... and a (hopefully interesting) surprise at the end.

read more see 1 comments

VXLAN, IP multicast, OpenFlow and control planes

A few days ago I had the privilege of being part of an VXLAN-related tweetfest with @bradhedlund, @scott_lowe, @cloudtoad, @JuanLage, @trumanboyes (and probably a few others) and decided to write a blog post explaining the problems VXLAN faces due to lack of control plane, how it uses IP multicast to solve that shortcoming, and how OpenFlow could be used in an alternate architecture to solve those same problems.

read more see 7 comments

FCoE and LAG – industry-wide violation of FC-BB-5?

Anyone serious about high-availability connects servers to the network with more than one uplink, more so when using converged network adapters (CNA) with FCoE. Losing all server connectivity after a single link failure simply doesn’t make sense.

If at all possible, you should use dynamic link aggregation with LACP to bundle the parallel server-to-switch links into a single aggregated link (also called bonded interface in Linux). In theory, it should be simple to combine FCoE with LAG – after all, FCoE runs on top of lossless Ethernet MAC service. In practice, there’s a huge difference between theory and practice.

read more see 24 comments

IPv6 multihoming without NAT: the problem

Every time I write about IPv6 multihoming issues and the need for NPT66, I get a comment or two saying “but I thought this is already part of IPv6 stack – can’t you have two or more IPv6 addresses on the same interface?” The commentators are right, you can have multiple IPv6 addresses on the same interface; the problem is: which one do you choose for outgoing sessions.

The source address selection rules are specified in RFC 3484 (Greg translated that RFC into an easy-to-consume format a while ago), but they are not very helpful as they cannot be influenced by the CPE router. Let’s look at the details.

read more see 14 comments

Embrane heleos: Scale-out Virtual Appliances

It’s getting harder and harder to decide whether to choose physical devices to do L4-7 processing (stateful- and web application firewalling, load balancing, VPN termination, WAN optimization) in your virtualized data center, or whether to deploy VM version of the same appliances.

Physical devices usually perform better. Virtual appliances are more flexible, but don’t scale well ... and Embrane just complicated your decision-making process: they launched scale-out distributed virtual appliance architecture and products that combine the best of both worlds.

read more see 7 comments

Decouple virtual networking from the physical world

Isn’t it amazing that we can build the Internet, run the same web-based application on thousands of servers, give millions of people access to cloud services … and stumble badly every time we’re designing virtual networks. No surprise, by trying to keep vSwitches simple (and their R&D and support costs low), the virtualization vendors violate one of the basic scalability principles: complexity belongs to the network edge.

read more see 11 comments

DHCPv6 server on Cisco IOS: making progress

DHCPv6 server on Cisco IOS got several highly useful enhancements since the first time I started looking into its behavior. Seems like most of them are implemented only in 15.xS trains (where they are most badly needed one would assume), but there’s hope those changes will eventually trickle down into mainstream IOS.

read more see 4 comments

Nexus 1000V and vMotion

I thought Nexus 1000V is like Aspirin compared to VMware’s vSwitch, providing tons of additional functionality (including LACP and BPDU filter) and familiar NX-OS CLI. It turns out I was right in more ways than I imagined; Nexus 1000V solves a lot of headaches, but can also cause heartburn due to a particular combination of its distributed architecture and reliance on vDS object model in vCenter.

read more see 3 comments

We just might need NAT66

My friend Tom Hollingsworth has written another NAT66-is-evil blog post. While I agree with him in principle, and most everyone agrees NAT as we know it from IPv4 world is plain stupid in IPv6 world (NAPT more so than NAT), we just might need NPT66 (Network Prefix Translation; RFC 6296) to support small-site multihoming ... and yet again, it seems that many leading IPv6 experts grudgingly agree with me.

read more see 24 comments

VM-aware Networking Improves IaaS Cloud Scalability

In the VMware vSwitch – the baseline of simplicity post I described simple layer-2 switches offered by most hypervisor vendors and the scalability challenges you face when trying to build large-scale solutions with them. You can solve at least one of the scalability issues pretty easily: VM-aware networking solutions available from most data center networking vendors dynamically adjust the list of VLANs on server-to-switch links.

read more see 8 comments

Interesting links (2011-12-04)

Let’s start with people who are trying to fix real-life problems. Browser and OS vendors are working around the lack of session layer – happy eyeballs approach solves dual-stack problem in either OS stack (Apple) or in the browsers (Chrome, Firefox). The results are ... interesting ;) ... but it seems most implementations are on the right track.

read more see 4 comments

VMware vSwitch – the baseline of simplicity

If you’re looking for a simple virtual switch, look no further than VMware’s venerable vSwitch. It runs very few control protocols (just CDP or LLDP, no STP or LACP), has no dynamic MAC learning, and only a few knobs and moving parts – ideal for simple deployments. Of course you have to pay for all that ease-of-use: designing a scalable vSwitch-based solution is tough (but then it all depends on what kind of environment you’re building).

read more see 6 comments

Junos Day One: IS-IS for dummies

For whatever reason I decided to start my Junos experience with a very simple IS-IS network – four core routers from my Building IPv6 Service Provider Core webinar. As Junosphere doesn’t support serial or POS interfaces, I migrated all links to Gigabit Ethernet and added a point-to-point GE link between PE-A and PE-B.

read more see 4 comments

Virtual Switches – from Simple to Scalable

Dan sent me an interesting comment after watching a recording of my Data Center 3.0 webinar:

I have a different view regarding VMware vSwitch. For me its the best thing happened in my network in years. The vSwitch is so simple, and its so hard to break something in it, that I let the server team to do what ever they want (with one small rule, only one vNIC per guest). I never have to configure a server port again :).

As always, the right answer is “it depends” – what kind of vSwitch you need depends primarily on your requirements.

read more see 7 comments

Sending Wake-on-LAN (WOL) packet with IOS Tcl

Jónatan Þór Jónasson took the time to implement Wake-on-LAN functionality using UDP support introduced in Cisco IOS Tcl in release 15.1(1)T. He found a TCL/TK example of a magic packet being sent, used that as a base, and with small modifications got it to work on his router. Here‘s his code (it’s obviously a proof-of-concept, but you need just a few more lines to get a working Tclsh script):

read more see 6 comments

Multi-level IS-IS in a single area? Think again!

Many service providers choosing IS-IS as their IGP use it within a single area (or at least run all routers as L1L2 routers). Multi-level IS-IS design is a royal pain, more so in MPLS environments where every PE-router needs a distinct route for every BGP next hop (see also Disable L1 Default Route in IS-IS). Moreover, MPLS TE is reasonably simple only within a single level (L1 or L2).

I’m positive at least some service providers do something as stupid as I usually did – deploy IS-IS with default settings using a configuration similar to this one:

read more see 16 comments

Junos Interfaces and Protocols: Now I get it

My Junos versus Cisco IOS: Explicit versus Implicit received a huge amount of helpful comments, some of them slightly philosophical, others highly practical – from using interfaces all combined with interface disable in routing protocol configuration, to using configuration groups (more about that fantastic concept in another post).

However, understanding what’s going on is not the same as being able to explain it in one sentence ... and Dan (@jonahsfo) Backman beautifully nailed that one.

read more see 2 comments

Log the source ports of HTTP sessions

You’re probably tired of this story by now: public IPv4 addresses are running out, lots of content is available only over IPv4, and so the service providers use NAT to give new clients (with no public IPv4 address) access to old content. It doesn’t matter which NAT variant the service provider is using, be it Carrier Grade Nat (CGN), NAT64, DS-Lite or A+P, the crucial problem is always the same: multiple users are hidden behind a single source IP address.

read more see 3 comments

Junos versus Cisco IOS: Explicit versus Implicit

My first Junosphere project was an IPv6 backbone; I wanted to create a simple single-area IS-IS/BGP-free backbone running LDP and MPLS, and using 6PE for IPv6 connectivity. Needless to say, even though I read the excellent Day One books (highly recommended: Exploring IPv6, Advanced IPv6 configuration and Deploying MPLS), I stumbled on almost every step.

read more see 18 comments

Data Center Fabrics Webinar – A Huge Thank You!

The Data Center Fabric Architectures webinar was very well received, took way longer than I initially predicted (3 hours instead of 2, but that’s usual with my webinars), and contained almost no factual errors (this is the part that I’m most happy about). Obviously I’m not that good; a lot of people were helping.

I would like to express a huge thank you to Raj Toor (Alcatel Lucent), Doug Gourlay (Arista), Lisa Caywood, Jon Hudson and Brook Reams (Brocade), Ron Fuller and Omar Sultan (Cisco), Brad Hedlund and Peter Wohlers (Force 10), Abner Germanow, Nadia Walker and product management teams from Juniper, Khalid Raza (HP) and Samrat Ganguly and Su-Hun Yun (NEC America). They were fixing my errors, pointing out my omissions, and helped me fine-tune the grading system, making the webinar way more accurate.

If you’ve missed the webinar, don’t worry – the recording is already available, and I’ll run another session in May, updating the scorecards with the improvements vendors will make in the next six months.

Add comment

Nexus vPC and Consistency Checker

Michel sent me a detailed e-mail describing both his enthusiasm with vPC and the headaches consistency checker is causing him. Here’s the good part:

Nexus vPC seems like a perfect solution for real multi-chassis etherchannel. At work we're using it extensively on a few pairs of Nexus 7000's.

... and then it turns sour:

However, there is one MAJOR drawback with vPC at this time, it's the way the consistency checker works (or rather, does not work). We've come across two specific situations where consistency checker will bring down your beautiful and redundant vPC link, and we've found no way around.

Here are his problems:

read more see 8 comments

Junos Versus Cisco IOS: MPLS and LDP

The comments igp2bgp and Tiziano Tofoni made to my LDP-IGP Synchronization in MPLS Networks post prompted me to look deeper into basic Junos MPLS configuration and LDP behavior. As expected, there are some significant differences between Cisco’s and Juniper’s LDP implementations (and, as is usually the case, they’re both strictly conformant with RFC 5036).

read more see 22 comments

OpenFlow: Enterprise Use Cases

One of the comments I usually get about OpenFlow is “sounds great and I’m positive Yahoo! and Google will eventually use it, but I see no enterprise use case.” (see also this blog post). Obviously nobody would go for a full-blown native OpenFlow deployment and we’ll probably see hybrid (ships-in-the-night) approach more often in research labs than in enterprise networks, but there’s always the integrated mode that allows you to add OpenFlow-based functionality on top of existing networking infrastructure.

read more see 1 comments

LDP-IGP synchronization in MPLS networks

A reader of my blog planning to migrate his network from a traditional BGP-everywhere design to a BGP-over-MPLS one wondered about potential unexpected consequences. The MTU implications of introducing MPLS in a running network are usually well understood (even though you could get some very interesting behavior); if you can, increase the MTU size by at least 16 bytes (4 labels) and check whether MTU includes L2 header. Another somewhat more mysterious beast is the interaction between IGP and LDP that can cause traffic disruptions after the physical connectivity has been reestablished.

read more see 24 comments

Data Center Fabric Architectures – Almost There

The last week has been “interesting” – I created the draft slide deck for the Data Center Fabric Architectures webinar on November 16th (register here) and sent the relevant slides to all vendors mentioned in the presentation to give them a chance to fix my errors – every vendor got at least the scorecard describing my understanding of their solution.

read more see 9 comments

Junosphere: the first impressions

Abner (@abnerg) Germanow and Dan (@jonahsfo) Backman are as good as their word: this week I got access to Junosphere, a great network-in-the-Clouds solution from Juniper. You might be familiar with Olive, the “non-existent” way of running Junos on an x86 machine (including a VM); Junosphere is the supported version of the same concept, including a real forwarding plane (it’s my understanding Olive lacks that, which makes certain protocols behave in unexpected ways).

read more see 11 comments

IPv6 Security: Getting Bored @ BRU Airport

Yesterday’s 6th Slovenian IPv6 Summit was (as always) full of awesome presentations, this time coming straight from some of the IPv6 legends: check the ones from Eric Vyncke (and make sure you read his IPv6 Security book), Randy Bush and Mark Townsley. The epic moment, however, was the “I was getting bored” part of Eric’s presentation (starts around 0:50:00). This is (in a nutshell) what he did:

read more see 14 comments

Junos Day One: Translating Configurations The Geeky Way

Abner (@abnerg) Germanov surprised us all at the end of Juniper’s presentation at Networking Tech Field Day when he announced Junosphere access for all the delegates – after a year of nagging, I would finally be able to touch Junos. However, instead of taking it easy and studying the excellent Junos Day One books (which I also did – if you’re new to Junos you should definitely start there; they are well worth reading), I decided to take a more geeky approach.

read more see 7 comments

Big Switch Networks might actually make sense

Big Switch Networks is one of those semi-stealthy startups that like to hint at what they’re doing without actually telling you anything, so I was very keen to meet Kyle Forster and Guido Appenzeller during the OpenFlow Symposium and asked them a simple question: “can you explain in 3 minutes what it is you’re doing?”

read more see 4 comments

Interesting links (2011-11-06)

The “discovery of the week” award goes to Terry Slattery for pointing out the dangers of bufferbloat while investigating TCP retransmissions (part 1 and part 2). BTW, in the end, he figured out it was just an overloaded Gigabit Ethernet linecard.

Two other interesting discoveries: PA /48 IPv6 prefixes are still filtered and BGP is more stable than we thought it would be.

read more see 6 comments

RFC Tidbit: IPv6 in 3GPP mobile networks

Did you ever want to have a high-level overview of how 3G/4G mobile networks work? Where GGSN and SGSN fit in? What the PDP contexts are ... and why you need two for dual-stack connectivity? All that (and a lot more) is explained in very well written IETF draft IPv6 in 3GPP Evolved Packet System. Reading highly recommended.

see 1 comments

Juniper’s Virtual Gateway – a virtual firewall done right

I stumbled upon VMsafe Network API (the API formerly known as dvFilter) while developing my VMware Networking Deep Dive webinar, set up the vShield App 4.1 in a lab, figured out how it works (including a few caveats), and assumed that’s how most virtual firewalls using dvFilter work. Boy was I wrong!

read more see 2 comments

Virtual switches need BPDU guard

An engineer attending my VMware Networking Deep Dive webinar has asked me a tough question that I was unable to answer:

What happens if a VM running within a vSphere host sends a BPDU? Will it get dropped by the vSwitch or will it be sent to the physical switch (potentially triggering BPDU guard)?

I got the answer from visibly harassed Kurt (@networkjanitor) Bales during the Networking Tech Field Day; one of his customers has managed to do just that.

Update 2011-11-04: The post was rewritten based on extensive feedback from Cisco, VMware and numerous readers.

read more see 34 comments

RFC Tidbit: IPv6 Flow Label

Finally someone decided to make IPv6 flow label useful. First they had to justify why they want to change it, and then modify the definition (way too much work for a field nobody ever used). Planned use is to enhance ECMP load balancing, both in native IPv6 environments (where using the flow label is faster than digging deep into variable-length IPv6 extension headers) and (even more importantly) in tunneled environments, where the flow label propagates the entropy from the tunnel payload into the envelope header.

Add comment

OpenFlow Deployment Models

I hope you never believed the “OpenFlow networking nirvana” hype in which smart open-source programmable controllers control dumb low-cost switches, busting the “networking = mainframes” model and bringing the Linux-like golden age to every network. As the debates during the OpenFlow symposium clearly illustrated, the OpenFlow reality is way more complex than it appears at a first glance.

To make it even more interesting, at least four different models for OpenFlow deployment have already emerged:

read more see 3 comments

Busting Layer-2 Data Center Interconnect Myths

A few weeks ago I delivered a short L2 DCI WebEx presentation to CCIE Club Poland. I took the L2 part of my Data Center Interconnect webinar and added 15 minutes of L2 DCI mythbusting (yes, it took me that long to explain what @reillyusa summarized in a single tweet). Here’s that part of the presentation; for the rest, buy a recording of my Data Center Interconnect webinar (it contains much more than just L2 DCI).

read more Add comment

L2 or L3 switching in campus networks?

Michael sent me an interesting question:

I work in a rather large enterprise facing a campus network redesign. I am in favor of using a routed access for floor LANs, and make Ethernet segments rather small (L3 switching on access devices). My colleagues seem to like L2 switching to VSS (distribution layer for the floor LANs). OSPF is in use currently in the backbone as the sole routing protocol. So basically I need some additional pros and cons for VSS vs Routed Access. :-)

The follow-up questions confirmed he has L3-capable switches in the access layer connected with redundant links to a pair of Cat6500s:

read more see 42 comments

I apologize, but I’m excited

The last few days were exquisite fun: it was great meeting so many people focusing on a single technology (OpenFlow) and concept (Software-Defined Networking, whatever that means) that just might overcome some of the old obstacles (and introduce new ones). You should be at least a bit curious what this is all about, and even if you don’t see yourself ever using OpenFlow or any other incarnation of SDN in your network, it never hurts to enhance your resume with another technology (as long as it’s relevant; don’t put CICS programmer at the top of it).

read more see 4 comments

Network Field Day – first impressions

We finished a fantastic Network Field Day (second edition) yesterday. While it will take me a while (and 20+ blog posts) to recover from the information blast I received during the last two days, here are the first impressions:

Explosion of innovation – and it’s not just OpenFlow and/or SDN. Last year we’ve seen some great products and a few good ideas (earning me the “grumpy old man that’s hard to make smile” fame), this year almost every vendor had something that excited me.

read more Add comment

ExpertExpress – just what you need for a tough MPLS/VPN RFP

A while ago I got a set of MPLS/VPN-related questions from one of my long-time readers furiously working on a response to a large RFP. I answered the questions and (more as an afterthought) mentioned the ExpertExpress service I had been starting to consider. His response amazed me:

ExpertExpress is definitely a very very good idea!!! You know what? I think I will push the company to try to use it to get your advice on the current engagement. The company needs this "yesterday" so I would be able to verify my design and will feel safer with it and will deliver it on time and of course you will receive a fair payment for this.

Next question – when could we do it? Response: how about tomorrow? Sure, no problem (note: it doesn’t always work out that way).

read more Add comment

Generic VLAN Design

Like every other blogger, I get occasional e-mails from people fishing for free consulting or second opinion (note: asking a serious technical question is a totally different story; as many people know, I always try to reply and help) and as I’m totally overloaded with OpenFlow symposium and Net Field Day these days, I decided to share one of the better ones.

read more see 39 comments

Cloud Computing Networking Under the Hood

The second presentation I had @ EuroNOG 2011 described networking requirements of various cloud services – a perfect combination of virtual networking and scalability, two of the topics I’m currently obsessed with. If you’re looking for more details, register for the Cloud Computing Networking: Under the Hood webinar on December 14th.

read more see 1 comments

QFabric Part 4 – Spanning Tree Protocol

Initial release of QFabric Junos can run STP only within the network node (see QFabric Control Plane post for more details), triggering an obvious question: “what happens if a server multihomed to a server node starts bridging between its ports and starts sending BPDUs?”. Some fabric solutions try to ignore STP (the diplomats would say “they are transparent to STP”) but fortunately Juniper decided to do the right thing.

read more see 4 comments

OpenFlow and the State Explosion

While everyone deeply involved with OpenFlow agrees it’s just a low-level tool that can’t solve problems we couldn’t solve in the past (just like replacing Tcl with C++ won’t help you prove P = NP), occasionally you stumble across mindboggling ideas that are so simple you have to ask yourself: “were we really that stupid?” One of them that obviously impressed James Hamilton is the solution to load balancing that requires no load balancers.

Before clicking Read more, watch this video and try to figure out what the solution is and why we’re not using it in large-scale networks.

read more see 11 comments

Net Field Day, here I come

After more than a year, I’m back in California, anxiously waiting to meet my fellow bloggers and ask some tough questions to a fantastic lineup of vendors presenting at Net Field Day 2011. Stephen Foskett’s well-oiled organizing machinery is already in full gear; I’m typing this post from a WiFi-equiped car that picked me up @ SFO airport (you see, dear vendors, it’s so easy to make my inner geek happy ... all I need are some fantastic features that are actually usable and work as well as this WiFi connection).

read more Add comment

Data Center Fabric Architecture webinar

I spent a lot of time during the last year analyzing various data center architectures proposed by networking vendors – from simple solutions like MLAG to complex architectures like QFabric – and presented the summary of my findings in a short presentation @ EuroNOG 2011 that received positive feedback from Cisco, Juniper and Brocade (and a post from Brook Reams @ EthernetFabric). That presentation (you can view it online) was pretty short due to the 45-minute slot I got and I decided to expand it into a proper 2-to-3 hour webinar.

read more Add comment

BGP and route maps

This is a nice email I got from an engineer struggling with multi-homing BGP setup:

We faced a problem with our internet routers a few days back. The engineer who configured them earlier used the syntax: network x.x.x.x mask y.y.y.y route-map PREPEND to influence the incoming traffic over two service-providers.

... and of course it didn’t work.

read more see 19 comments

What is Nicira really up to?

Yesterday New York Times published an article covering Nicira, a semi-stealthy startup working on an open source soft switch (Open vSwitch) and associated OpenFlow-based controller, triggering immediate responses from GigaOm and Twilight in the Valley of the Nerds. While everyone got entangled in the buzzwords (or lack of them), not a single article answered the question “what is Nicira really doing?” Let’s fix that.

read more see 2 comments

IPv6 End User Authentication on Metro Ethernet

One of the areas where IPv6 sorely lacks feature parity with IPv4 is user authentication and source IP spoofing prevention in large-scale Carrier Ethernet networks. Metro Ethernet switches from numerous vendors offer all the IPv4 features a service provider needs to build a secure and reliable access network where the users can’t intercept other users’ traffic or spoof source IP addresses, and where it’s always possible to identify the end customer from an IPv4 address – a mandatory requirement in many countries. Unfortunately, you won’t find most of these features in those few Metro Ethernet switches that support IPv6.

read more see 14 comments

Follow-the-sun workload mobility? Get lost!

A week ago I was writing about the latency and bandwidth challenges of long-distance vMotion and why it rarely makes sense to use it in disaster avoidance scenarios (for a real disaster avoidance story, read this post ... and note no vMotion was used to get out of harm’s way). The article I wrote for SearchNetworking tackles an idea that is an order of magnitude more ridiculous: using vMotion to migrate virtual machines around the world to bring them close to the users (follow-the-sun workload mobility). I wonder which islands you’d have to use to cross the Pacific in 10ms RTT hops supported by vMotion?

As described in my Data Center Interconnects webinar (and the Scalability Rules book), the only solution that really works is a scale-out application architecture combined with load balancers.

Read and enjoy the article ...

see 6 comments

MPLS is not tunneling

Greg (@etherealmind) Ferro started an interesting discussion on Google+, claiming MPLS is just tunneling and a duct tape like NAT. I would be the first one to admit MPLS has its complexities (not many ;) and shortcomings (a few ;), but calling it a tunnel just confuses the innocents. MPLS is not tunneling, it’s a virtual-circuits-based technology, and the difference between the two is a major one.

read more see 13 comments

What Is OpenFlow (Part 2)?

Got this set of questions from a CCIE pondering emerging technologies that could be of potential use in his data center:

I don’t think OpenFlow is clearly defined yet. Is it a protocol? A model for Control plane – Forwarding plane FP interaction? An abstraction of the forwarding-plane? An automation technology? Is it a virtualization technology? I don’t think there is consensus on these things yet.

As the OpenFlow Symposium is just a few weeks away, let’s try to position OpenFlow in the big picture.

read more see 1 comments

VXLAN termination on physical devices

Every time I’m discussing the VXLAN technology with a fellow networking engineer, I inevitably get the question “how will I connect this to the outside world?” Let’s assume you want to build pretty typical 3-tier application architecture (next diagram) using VXLAN-based virtual subnets and you already have firewalls and load balancers – can you use them?

The product information in this blog post is outdated - Arista, Brocade, Cisco, Dell, F5, HP and Juniper are all shipping hardware VXLAN gateways (this post has more up-to-date information). The concepts explained in the following text are still valid; however, I would encourage you to read other VXLAN-related posts on this web site or watch the VXLAN webinar to get a more recent picture.

read more see 15 comments

Interesting links (2011-10-09)

I was overloaded during the last few weekends and my Inbox is yet again overflowing with links to excellent content. For a warm-up, look at the eight levels of vendor acceptance (a side effect of a really tough lab test during the EuroNOG 2011 conference).

On a more serious note, the most useful article of this week is probably the BGPmon Web Services API that describes how you can query the global BGP table through whois or SOAP.

read more see 2 comments

Do I need IPv6 in my Enterprise (again)

Ethan Banks, one of the masterminds behind the Packet Pushers podcast, wrote a spot-on blog describing why enterprises don’t deploy IPv6. Unfortunately, most of the enterprise networking engineers follow the same line of reasoning, and a few of them might feel like the proverbial deer caught in the headlights once something totally unexpected happen ... like their CEO vacationing in China, getting only IPv6 address on the iPhone, and thus not being able to access a mission-critical craplication. For a longer-term perspective, read an excellent reply written by Tom Hollingsworth.

read more see 7 comments

CloudSwitch – VLAN extension done right

I’ve first heard about CloudSwitch when writing about vCider. It seemed like an interesting idea and I wanted to explore the networking aspects of cloud VLAN extension for my EuroNOG presentation. My usual approach (read the documentation) failed – the documentation is not available on their web site – but I got something better: a briefing from Damon Miller, their Director of Technical Field Operations. So, this is how I understood CloudSwitch works (did I get it wrong? Write a comment!):

read more Add comment

Data Center Fabric Architectures (EuroNOG 2011)

As you might know, I did a Data Center Fabric Architectures presentation at the EuroNOG 2011 conference last week. The slides are available online; for a more in-depth version, please register for my Data Center Fabrics webinar.

As always, if I got anything wrong, please write a comment.

see 5 comments

Reliable or unreliable cloud services?

The question of high-availability cloud services (let’s agree reliable in this context really means highly available) pops up every time I discuss cloud networking requirements with enterprise-focused experts. While it’s obvious the software- and platform services must be highly available (as their users have few mechanisms to increase their availability), Infrastructure-as-a-Service (IaaS) remains a grey area.

However, once you look at the question from the business perspective, it seems Amazon probably made a pretty good choice: offer reasonably-available service at a low price. You’ll find more in-depth arguments in the Cloud infrastructure services: Balancing high availability and cost article I wrote for SearchCloudProvider.com.

see 3 comments

Long-distance vMotion for disaster avoidance? Do the math first

The proponents of inter-DC layer-2 connectivity (required by long-distance vMotion) inevitably cite disaster avoidance (along with buzzword-bingo-winning business agility) as one of the primary requirements after they figure out stretched clusters might not be such a good idea (and there’s no way to explain the dangers of split subnets to some people). When faced with the disaster avoidance “requirement”, ask them to do some basic math first.

read more see 14 comments

DMVPN: Spoke QoS Challenge

Got the following question with an invalid return address, so I’m broadcasting the reply ;)

I am running a DMVPN network and recently got a requirement for spoke-to-spoke communication. We currently shape traffic on a per spoke basis on the hub, and have a single shaper at the remote site. However, if a spoke is receiving a large amount of traffic from the hub and another spoke site, how will the sites sending traffic know that the remote port is congested?

Short answer – they won’t. You have a mission-impossible problem (very similar to ADSL QoS), but there might be some slight silver lining:

read more see 5 comments

QFabric Part 3 – Forwarding

You won’t find much about the QFabric forwarding architecture and resulting behavior in the documentation; white papers might give you more insight and I’m positive more detailed ones will start appearing on Juniper’s web site now that the product is shipping. In the meantime, let’s see how far we can get based on two simple assumptions: (A) The "one tier architecture" claim is true and (B) Juniper has some very smart engineers.

read more see 11 comments

VXLAN: awesome or braindead?

Just a few hours after VXLAN was launched, I received an e-mail from one of my readers asking (literally) if VXLAN was awesome or braindead. I decided to answer this question (you know the right answer is it depends) and a few others in a FastPacket blog post published by SearchNetworking.

I wrote the post before NVGRE was published and missed the “brilliant” idea of using GRE key as virtual segment ID.

Read more @ SearchNetworking

see 6 comments

ExpertExpress – online help when and where you need it most

Occasionally my readers ask me if I would be available for a consulting/design project (or send me questions that are actually design review/second opinion challenges). I usually recommend using our Professional Services team for larger projects (I try to do only a few larger consulting projects per year ... but that shouldn’t stop you from asking ;), but quite often the amount of work involved is so low that it simply doesn’t make sense to go through all the paperwork nightmare and I decided to create ExpertExpress service to address those cases.

read more see 4 comments

Net::Beer in Krakow

I’ll be in Krakow for the PLNOG/EuroNOG conferences Wednesday through Friday. This is not the primary reason I’m arriving on Wednesday (although it does look tempting) – I wanted to have enough time for discussions with fellow networking engineers and Thursday afternoon/Friday will probably be pretty busy. So, if you’d like to chat with me about exciting networking technologies, just find me in the crowd (unfortunately I won’t be wearing this T-shirt) ... and if you’d like to have a more serious (and longer) discussion, get in touch with me or send me a tweet.

see 4 comments

QFabric Part 2 – Control Plane Overview

Like anyone else, I was pretty impressed with the QFabric hardware architecture when Juniper announced it, but remained way more interested in the control-plane aspects of QFabric. After all, if you want multiple switches to behave like a single device, you could either use Borg-like architecture with a single control plane entity, or implement some very clever tricks.

Nobody has yet demonstrated a 100-switch network with a single control plane (although the OpenFlow aficionados would make you believe it’s just around the corner), so it must have been something else.

read more see 1 comments

Quick question: IP multicast over an existing IP backbone

Imagine you’d actually want to run VXLAN between two data centers (I wouldn’t but that’s beyond the point at this moment) and the only connectivity between the two is IP, no multicast. How would you implement IP multicast across a generic IP backbone? Anything goes, from duct tape (GRE) to creative solutions ... and don’t forget those pesky RPF checks.

see 12 comments

IPv6 support in Cisco’s Data Center gear – Fall’11

Comparing promises, deliverables and generic progress seems to be popular in the harvest season, so let’s see how far Cisco pushed the Data Center IPv6 support in the six months since my last status report.

Kudos to the Nexus 7000/NX-OS team for doing the right thing. Not only did they make me happy by implementing full-blown MPLS, MPLS/TE and MPLS/VPN, they included 6PE and 6VPE in the first release of the MPLS code. Great job!

read more see 13 comments

QFabric Part 1 – Hardware Architecture

Juniper has finally released the technical documentation for the QFabric virtual switch and its components (QF/Node, QF/Interconnect and QF/Director). As expected, my speculations weren’t too far off – if anything, Juniper didn’t go far enough along those lines, but we’ll get there later.

The generic hardware architecture of the QFabric switching complex has been well known for quite a while (listening to the Juniper QFabric Packet Pushers Podcast is highly recommended) – here’s a brief summary:

read more see 28 comments

NVGRE – because one standard just wouldn’t be enough

Two weeks after VXLAN (backed by VMware, Cisco, Citrix and Red Hat) was launched at VMworld, Microsoft, Intel, HP & Dell published NVGRE draft (Arista and Broadcom are cleverly sitting on both chairs) which solves the same problem in a slightly different way.

If you’re still wondering why we need VXLAN and NVGRE, read my VXLAN post (and the one describing how VXLAN, OTV and LISP fit together), register for the Introduction to Virtual Networking webinar or read the Introduction section of the NVGRE draft.

read more Add comment

Interesting links (2011-09-18)

The Mixed Feelings award of the week goes to Doug Gourlay and his Why FCoE is Dead, But Not Buried Yet article. While I agree with everything he’s saying about L2 and L3, the FCoE part of the post is shaky enough to generate tons of comments (or maybe that was the goal). For a hilarious perspective on the same topic, read Fiber Channel and Ethernet – the odd couple.

And here are the other great articles I stumbled upon during the last few days:

read more Add comment

Responsible generation of BGP default route

Chris sent me the following question a while ago:

I've got a full Internet BGP table, and want to responsibly send a default route to a downstream AS. It's the "responsibly" part that's got me frustrated: How can I judge whether the internet is working and make the origination of the default conditional on that?

He’d already figured out the neighbor default-originate route-map command, but wanted to check for more generic conditions than the presence of one or more prefixes in the IP routing table.

read more see 10 comments

Changing configuration with EEM – yes or no?

Daniel left a very relevant comment to my convoluted BGP session shutdown solution:

What I am currently doing is using EEM to watch my tracked objects and then issuing a neighbor shutdown command. Is there a functional reason I would not want to do it that way, and use the method you prescribe?

As always, the answer is “it depends.” In this case, the question to ask yourself is: “do I track configuration changes and react to them?

read more see 7 comments

OSPF-over-DMVPN using two hub routers

One of my readers sent me the following question a few days ago:

Do you have a webinar that covers Dual DMVPN HUB deployment using OSPF? If so which webinar covers it?

I told him that the DMVPN: From Basics to Scalable Networks webinar covers exactly that scenario (and numerous others), describing both Phase 1 DMVPN and Phase 2 DMVPN design and implementation guidelines. Interestingly, he replied that the information on this topic seems to be very scant:

read more Add comment

You don’t need OpenFlow to solve every age-old problem

I read two great blog posts on Sunday: evergreen Fallacies of Distributed Computing from Bob Plankers and forward-looking Understanding Hadoop Clusters and the Network from Brad Hedlund. Read them both before continuing (they are both great reads) and try to figure out why I’m mentioning them in the same sentence (no, it’s not the fact that Hadoop uses distributed computing).

read more see 10 comments

Long-distance IRF fabric: works best in PowerPoint

HP has recently commissioned an IRF network test that came to absolutely astonishing conclusions: vMotion runs almost twice as fast across two links bundled in a port channel than across a single link (with the other one being blocked by STP). The test report contains one other gem, this one a result of incredible creativity of HP marketing:

For disaster recovery, switches within an IRF domain can be deployed across multiple data centers. According to HP, a single IRF domain can link switches up to 70 kilometers (43.5 miles) apart.

You know my opinions about stretched cluster ... and the more down-to-earth part of HP Networking (the people writing the documentation) agrees with me.

read more see 9 comments

Shut down BGP session based on tracked object

In responses to my The Road to Complex Designs is Paved With Great Recipes post Daniel suggested shutting down EBGP session if your BGP router cannot reach the DMZ firewall and Cristoph guessed that it might be done without changing the router configuration with the neighbor fall-over route-map BGP configuration command. He was sort-of right, but the solution is slightly more convoluted than he imagined.

read more see 10 comments

Introduction to Virtualized Networking webinar

When I started writing about VXLAN, I received a few tweets along the lines of “I have no clue what you’re writing about.” Here’s a chance to fix that: I’ll run an Introduction to Virtualized Networking webinar in early October (register), trying to demystify the acronyms and marketectures. It doesn’t assume you know anything about server virtualization or IaaS; we’ll start from scratch and cover as much ground as possible.

If you’ve attended the Data Center 3.0 or VMware Networking Deep Dive webinars, you won’t learn anything new from this one; it’s a prequel to the other two.

see 1 comments

IPv6 MPLS/VPN (6VPE) with PPPoE and RADIUS

During my visit to South Africa someone told me that he got 6VPE working over an L2TP connection ... and that you should “use the other VRF attribute, not lcp:interface-config” to make it work. A few days ago one of the readers asked me the same question and although I was able to find several relevant documents, I wanted to see it working in my lab.

read more see 1 comments

Large-scale bridging = nuked earth

If you’re not working for a data center fabric vendor (in which case please read the other today’s post), you’ll probably enjoy the excellent analogy Ethan Banks made after reading my TRILL-over-WAN post:

Think of a network topology like a road map. There's boulevards, major junction points, highways, dead ends, etc. Now imagine what that map looks like after it's been nuked from orbit: flat. Sure, we blew up the world, but you can go in a straight line anywhere you want.

... and don’t forget to be nice to the people asking for inter-DC VM mobility ;)

Add comment

Data Center Fabrics – did I miss anything?

During the EuroNOG conference in Krakow (if you’re in Europe, make sure to be there ;) I’ll talk about data center fabrics, trying to explain the realities behind some of the marketectures.

So far my presentation covers Cisco’s Fabric Path, VPC, VSS and port extenders, Brocade’s VCS Fabric based on what’s available in Brocade NOS 2.0 (they still have to decide whether they’ll tell me what’s new in NOS 2.1), Juniper’s Virtual Chassis and XRE, HP’s IRF, and OpenFlow.

read more see 5 comments

Nexus 1000V LACP offload and the dangers of in-band control

A while ago someone sent me the following comment as part of a lengthy discussion focusing on Nexus 1000V: “My SE tells me that the latest 1000V release has rewritten the LACP code so that it operates entirely within the VEM. VSM will be out of the picture for LACP negotiations. I guess there have been problems.

If you’re not familiar with the Nexus 1000V architecture, read this post first. If you’re not convinced you should be running LACP between the ESX hosts and the physical switches, read this one (and this one). Ready? Let’s go.

read more see 1 comments

TRILL goes to WAN – the bridging craze continues

Remember how I foretold when TRILL first appeared that someone would be “brave” enough to reinvent WAN bridging and brouters that we so loved to hate in the early 90’s? The new wave of the WAN bridging craze has started: RFC 6361 defines TRILL over PPP (because bridging-over-PPP is just not good enough). Just because you can doesn’t mean you should.

see 4 comments

VXLAN, OTV and LISP

Immediately after VXLAN was announced @ VMworld, the twittersphere erupted in speculations and questions, many of them focusing on how VXLAN relates to OTV and LISP, and why we might need a new encapsulation method.

VXLAN, OTV and LISP are point solutions targeting different markets. VXLAN is an IaaS infrastructure solution, OTV is an enterprise L2 DCI solution and LISP is ... whatever you want it to be.

read more see 18 comments

Yearly webinar subscriptions for sub-Saharan users

Some of my African readers couldn’t buy the yearly webinar subscription in the past due to variety of credit card-related problems. I spent a few days with NIL Africa’s wonderful team after a fun consulting engagement that brought me to South Africa ... and managed to find a solution – get in touch with Lefkia Swart @ NIL Africa and she’ll send you an invoice.

Add comment

VXLAN: MAC-over-IP-based vCloud networking

In one of my vCloud Director Networking Infrastructure rants I wrote “if they had decided to use IP encapsulation, I would have applauded.” It’s time to applaud: Cisco has just demonstrated Nexus 1000V supporting MAC-over-IP encapsulation for vCloud Director isolated networks at VMworld, solving at least some of the scalability problems MAC-in-MAC encapsulation has.

Nexus 1000V VEM will be able to (once the new release becomes available) encapsulate MAC frames generated by virtual machines residing in isolated segments into UDP packets exchanged between VEMs.

read more see 7 comments

FCoE networking elements classification

When I (somewhat jokingly) wrote about the dense- and sparse-mode FCoE, I had no idea someone would try to extend the analogy to all possible FCoE topologies like Tony Bourke did. Anyhow, now that the cat is out of the bag, let’s state the obvious: enumerating all possible FCoE topologies is like trying to list all possible combinations of NAT, IP routing over at least two L2 technologies, and bridging; while it can be done, the best one can reasonably hope for is a list of supported topologies from various vendors.

However, it might make sense to give you a series of questions to ask the vendors offering FCoE gear to help you classify what their devices actually do.

read more see 5 comments

BGP next hop processing

Following my IBGP or EBGP in an enterprise network post a few people have asked for a more graphical explanation of IBGP/EBGP differences. Apart from the obvious ones (AS path does not change inside an AS) and more arcane ones (local preference is only propagated on IBGP sessions, MED of an EBGP route is not propagated to other EBGP neighbors), the most important difference between IBGP and EBGP is BGP next hop processing.

read more see 16 comments

Interesting links (2011-08-28)

Most persuasive argument of the week: “Are traffic charges needed to avert a coming capex catastrophe?” by Robert Kenny. This is how you rebut the claims of the greedy Service Providers (and their hired guns), not by hysterical screaming and spitting perfected by some net neutrality zealots.

Most insightful talk: An Attempt to Motivate and Clarify Software-Defined Networking by Scott Shenker. While he’s handwaving across a lot of details, the framework does make sense.

read more see 3 comments

DMVPN as a backup for MPLS/VPN

SK left a long comment to my More OSPF-over-DMVPN Questions post describing a scenario I find quite often in enterprise networks:

  • Primary connectivity is provided by an MPLS/VPN service provider;
  • Backup connectivity should use DMVPN;
  • OSPF is used as the routing protocol;
  • MPLS/VPN provider advertises inter-site routes as external OSPF routes, making it hard to properly design the backup connectivity.

If you’re familiar with the way MPLS/VPN handles OSPF-in-VRF, you’re probably already asking the question “how could the inter-site OSPF routes ever appear as E1/E2 routes?”

read more see 4 comments

IBGP or EBGP in an enterprise network?

I got the following question from one of my readers:

I recently started working at a very large enterprise and learnt that the network uses BGP internally. Running IBGP internally is not that unexpected, but after some further inquiry it seems that we are running EBGP internally. I must admit I'm a little surprised about the use of EBGP internally and I wanted to know your thoughts on it.

Although they are part of the same protocol, IBGP and EBGP solve two completely different problems; both of them can be used very successfully in a large enterprise network.

read more see 22 comments

BGP/IGP Network Design Principles

In the next few days, I'll write about some of the interesting topics we’ve been discussing during the last week’s fantastic on-site workshop with Ian Castleman and his team. To get us started, here’s a short video describing BGP/IGP network design principles. It’s taken straight from my Building IPv6 Service Provider Core webinar (recording), but the principles apply equally well to large enterprise networks.

read more Add comment

Soft switching might not scale, but we need it

Following a series of soft switching articles written by Nicira engineers (hint: they are using a similar approach as Juniper’s QFabric marketing team), Greg Ferro wrote a scathing Soft Switching Fails at Scale reply. While I agree with many of his arguments, the sad truth is that with the current state of server infrastructure virtualization we need soft switching regardless of the hardware vendors’ claims about the benefits of 802.1Qbg (EVB/VEPA), 802.1Qbh (port extenders) or VM-FEX.

read more see 7 comments

Quotes of the week

I’ve spent the last few days with a fantastic group of highly skilled networking engineers (can’t share the details, but you know who you are) discussing the topics I like most: BGP, MPLS, MPLS Traffic Engineering and IPv6 in Service Provider environment.

One of the problems we were trying to solve was a clean split of a POP into two sites, retaining redundancy without adding too much extra equipment. The strive for maximum redundancy nudged me to propose the unimaginable: layer-2 interconnect between four tightly controlled routers running BGP, but even that got shot down with a memorable quote from the senior network architect:

read more see 4 comments

DMVPN design success story

Warning: totally shameless plug ahead. You might want to stop reading right now.

Every now and then one of the engineers listening to my webinars shares a nice success story with me. One of them wrote:

I'm doing a DMVPN deployment and Cisco design docs just don’t cover dual ISPs for the spokes hence I thought I give your webinars/configs a try.

... and a bit later (after going through the configs that you get with the DMVPN webinar):

read more see 1 comments

VM-FEX – not as convoluted as it looks

Reading Cisco’s marketing materials, VM-FEX (the feature probably known as VN-Link before someone went on a FEX-branding spree) seems like a fantastic idea: VMs running in an ESX host are connected directly to virtual physical NICs offered by the Palo adapter and then through point-to-point virtual links to the upstream switch where you can deploy all sorts of features the virtual switch embedded in the ESX host still cannot do. As you might imagine, the reality behind the scenes is more complex.

read more see 8 comments

Source MAC address spoofing DoS attack

The flooding attacks (or mishaps) on large layer-2 networks are well known and there are ample means to protect the network against them, for example storm control available on Cisco’s switches. Now imagine you change the source MAC address of every packet sent to a perfectly valid unicast destination.

read more see 3 comments

The road to complex designs is paved with great recipes

A while ago someone asked me to help him troubleshoot his Internet connectivity. He was experiencing totally weird symptoms that turned out to be a mix of MTU problems, asymmetric routing (probably combined with RPF checks on ISP side) and non-routable PE-CE subnets. While trying to figure out what might be wrong from the router configurations, I was surprised by the amount of complexity he’d managed to introduce into his DMZ design by following recipes and best practices we all dole out in blog posts, textbooks and training materials.

read more see 14 comments

Interesting links (2011-08-14)

My Inbox is overflowing (yet again); here are some great links from last week:

Data centers and summer clouds

F5 is addressing an interesting problem with its latest software release: DNS DoS attacks. In other posts, Lori MacVittie describes cloud configuration management and troubleshooting problems.

Matthew Norwood describes an interesting product from HP – you could probably build a small data center with a single blade enclosure.

read more Add comment

More OSPF-over-DMVPN questions

After weeks of waiting, perfect summer weather finally arrived ... and it’s awfully hard to write blog posts that make marginal sense when being dead-tired from day-long mountain biking, so I’ll just recap the conversation I had with Brian a few days ago. He asked “How would I set up a (dual) hub running OSPF with phase 1 spokes and prevent all spoke routes from being seen at other spokes? Think service provider environment.

read more see 3 comments

Stop reinventing the wheel and look around

Building large-scale VLANs to support IaaS services is every data center designer’s nightmare and the low number of VLANs supported by some data center gear is not helping anyone. However, as Anonymous Coward pointed out in a comment to my Building a Greenfield Data Center post, service providers have been building very large (and somewhat stable) layer-2 transport networks for years. It does seem like someone is trying to reinvent the wheel (and/or sell us more gear).

read more see 7 comments

High availability fallacies

I’ve already written about the stupidities of risking the stability of two data centers to enable live migration of “mission critical” VMs between them. Now let’s take the discussion a step further – after hearing how critical the VM the server or application team wants to migrate is, you might be tempted to ask “and how do you ensure its high availability the rest of the time?” The response will likely be along the lines of “We’re using VMware High Availability” or even prouder “We’re using VMware Fault Tolerance to ensure even a hardware failure can’t bring it down.”

read more see 10 comments

VLANs used by Nexus 1000V

Chris sent me an interesting question:

Imagine L2 traffic between two VMs on different ESX hosts, both using Nexus 1000V. Will the physical switches see the traffic with source and destination MACs matching the VM's vNICs or traffic on NX1000V "packet" VLAN between VEMs (in this case, the packet VLAN would act as a virtual backplane)?

To decode the acronyms in the question, please read my What exactly is a Nexus 1000V post.

read more Add comment

Imagine the ruckus when the hypervisor vendors wake up

It seems that most networking vendors consider the Flat Earth architectures the new bonanza ... everyone is running to join the gold rush, from Cisco’s FabricPath and Brocade’s VCS to HP’s IRF and Juniper’s upcoming QFabric. As always, the standardization bodies are following the industry with a large buffet of standards to choose from: TRILL, 802.1ag (SPB), 802.1Qbg (EVB) and 802.1bh (Port extenders).

read more see 14 comments

Building a Greenfield Data Center

The following design challenge landed in my Inbox not too long ago:

My organization is the in the process of building a completely new data center from the ground up (new hardware, software, protocols ...). We will currently start with one site but may move to two for DR purposes. What DC technologies should we be looking at implementing to build a stable infrastructure that will scale and support technologies you feel will play a big role in the future?

In an ideal world, my answer would begin with “Start with the applications.”

read more see 12 comments

Introduction to virtual firewalls

SearchNetworking has just published my article describing the issues you’ll face when deploying virtualized firewalls (you might want to read the one describing benefits and drawbacks of virtual appliances first). The article focuses primarily on the VMsafe Network API (aka dvFilter) and VMware’s vShield; you’ll find more in-depth information on alternate solutions (including HP’s and Juniper’s products using dvFilter API and Cisco’s vPath API) in my VMware Networking Deep Dive webinar (register here or buy a recording).

see 3 comments

Penultimate Hop Popping (PHP) demystified

I got an interesting question after writing the Asymmetric MPLS MTU Problem post: “Why does PHP happen only on directly-connected interfaces but not on other non-MPLS routes?” Obviously it’s time for a deep dive into Penultimate Hop Popping (PHP) mysteries (warning label: read the MPLS books if you plan to get seriously involved with MPLS).

read more see 4 comments

vSphere 5.0 new networking features: disappointing

I was sort of upset that my vacations were making me miss the VMware vSphere 5.0 launch event (on the other hand, being limited to half hour Internet access served with early morning cappuccino is not necessarily a bad thing), but after I managed to get home, I realized I hadn’t really missed much. Let me rephrase that – VMware launched a major release of vSphere and the networking features are barely worth mentioning (or maybe they’ll launch them when the vTax brouhaha subsides).

read more see 12 comments

Disasters and recoveries ... part 2

After the bumpy start of our holidays, we thoroughly enjoyed the crystal-clear waters, hot sunny weather and the hospitality of inhabitants of Croatian island Brač ... until my daughter came to me quietly asking “hey, I don’t want to raise panic, but my friend saw a weird cloud ... would you mind checking if it’s a forest fire” A short walk to a vantage point confirmed the initial observation – we were facing what turned out to be the worst forest fire in more than a decade. Obviously I was bound to receive another hefty dose of disaster recovery lessons.

read more see 3 comments

Disasters happen ... it’s the recovery that matters

My recent vacation included a few perfect lessons in disaster recovery. Fortunately the disasters were handled by total pros that managed them perfectly. It all started when we were already packed and driving – my travel agent called me to tell me someone mixed up the dates and shifted them by two months; we were expected to arrive in late August. Not good when you have small kids all excited about going to the seaside sitting in the car.

read more see 9 comments

Interesting links (2011-07-17)

Lots of interesting articles accumulated in my Inbox while I tried to figure out what one could possibly do when being stranded in an easy chair next to the sea with no Internet access. By far the best article that I stumbled upon in my Twitter feed is a 10-year-old IS-IS versus OSPF presentation by the legendary Dave Katz (thank you @yelfathi).

read more Add comment

The MPLS MTU challenges

@MCL_Nicolas sent me the following tweet: “Finished @packetpushers Podcast show 7 with @ioshints ... I Want to learn more about Mpls+Mtu problem” You probably know I simply have to mention that a great MPLS/VPN book and a fantastic webinar describe numerous MPLS/VPN-related challenges and solutions (including MTU issues), but if MTU-related problems are the only thing standing between you and an awesome MPLS/VPN network, here are the details.

read more see 7 comments

Is Fibre Channel switching bridging or routing?

A comment left on my dense-mode FCoE post is a perfect example of the dangers of using vague, marketing-driven and ill-defined word like “switching”. The author wrote: “FC-SW is by no means routing ... Fibre Channel is switching.” As I explained in one of my previous posts, switching can mean anything, from circuit-based activities to bridging, routing and even load balancing (I am positive some vendors claim their load balancers ... oops, application delivery controllers ... are L4-L7 switches), so let’s see whether Fibre Channel “switching” is closer to bridging or routing.

read more see 3 comments

Do we need distributed switching on Nexus 2000?

Yandy sent me an interesting question:

Is it just me or do you also see the Nexus 2000 series not having any type of distributed forwarding as a major design flaw? Cisco keeps throwing in the “it's a line-card” line, but any dumb modular switch nowadays has distributed forwarding in all its line cards.

I’m at least as annoyed as Yandy is by the lack of distributed switching in the Nexus port (oops, fabric) extender product range, but let’s focus on a different question: does it matter?

read more see 6 comments

Monitor multiple interfaces with a single EEM applet

Michael modified one of my EEM applets to monitor CRC errors on WAN interfaces and notify the operator (via e-mail) when an interface has more than two errors per minute. He wanted to monitor multiple interfaces and asked me whether it’s possible to modify the SNMP event detector somehow. I only had to point him to the event correlation feature of EEM version 2.4 and he sent me the following (tested) applet a few days later.

read more see 8 comments

Hypervisors use promiscuous NIC mode – does it matter?

Chris Marget sent me the following interesting observation:

One of the things we learned back at the beginning of Ethernet is no longer true: hardware filtering of incoming Ethernet frames by the NICs in Ethernet hosts is gone. VMware runs its NICs in promiscuous mode. The fact that this Networking 101 level detail is no longer true kind of blows my mind.

So what exactly is going on and does it matter?

read more see 10 comments

All MTUs are not the same

Matthew sent me the following remarkable fact (and he just might have saved some of you a few interesting troubleshooting moments):

I was bringing up an OSPF adjacency between a Catalyst 6500 and an ASR 9006 and kept getting an MTU mismatch error. The MTU was set exactly the same on both sides. So I reset them both back to default (1500 on the 6500 and 1514 on the ASR 9006) and the adjacency came back up, even though now the MTU is off by 14 bytes. So I attempted to bump the MTU up again, this time setting the MTU on 6500 to 1540 and the MTU on the ASR 9006 to 1554. Adjacency came right up. Is there something I am missing?

The 14 byte difference is the crucial point – that’s exactly the L2 header size (12 bytes for two 6-byte MAC addresses and 2 bytes for ethertype). When you specify MTU size on the IOS classic (either with the ip mtu command or with the mtu command), you specify the maximum size of the layer-3 payload without the layer-2 header. Obviously IOS XR works differently – there you have to specify the maximum size of a layer-2 frame, not of its layer-3 payload (comments describing how other platforms behave are most welcome!).

see 8 comments

Moving to summer schedule

My web site statistics are (yet again) confirming the inevitable truth: the holiday season has started in the northern hemisphere. I hope you’ll be busy doing things that are more fun than reading my blog, so I’ll publish only two or three articles per week to prevent information overload, returning to the regular daily schedule in late August.

see 5 comments

Multisite clusters done right ... by none other than Microsoft

I had to check the Microsoft clustering terminology a few days ago, so I used Google to find the most relevant pages for “Windows cluster” and landed on the Failover clustering home page where the Multisite Clustering link immediately caught my attention. Dreading the humongous amount of layer-2 DCI stupidities that could lurk hidden behind such a concept, I barely dared to click on the link ... which unveiled one of the most pleasant surprises I’ve got from an IT vendor in a very long time. Microsoft actually understands that some people prefer to keep their IT infrastructure stable and supported multi-subnet clusters for quite some time. What a revolutionary concept for the L2-crazed flat-earth world some other vendors are busy promoting.

read more see 1 comments

Brocade ServerIron ADX – NAT64 done right

With the latest software release (12.3.01) the ServerIron ADX, Brocade’s load balancer product, supports the real NAT64 (not 6-to-4 load balancing). Even more, it supports all of the features I would like to see in a NAT64 box plus a few more:

True NAT64 support, mapping the whole IPv4 address space into an IPv6 prefix that can be reached by IPv6 clients. One would truly hope the implementation is conformant with RFC 6146, but the RFC is not mentioned in the documentation and I had no means of checking the actual behavior. DNS64 is not included, but that’s not a major omission as BIND 9.8.0 supports it.

read more Add comment

The beauties of dense-mode FCoE

J Michel Metz brought out an interesting aspect of the dense/sparse mode FCoE design dilemma in a comment to my FCoE over Trill ... this time from Juniper post: FC-focused troubleshooting. I have to mention that he happens to be working for a company that has the only dense-mode FCoE solution, but the comment does stand on its own.

Before reading this post you might want to read the definition of dense- and sparse-mode FCoE and a few more technical details.

read more see 15 comments

Soft (hypervisor) switching links

Martin Casado and his team have published a great series of blog articles describing hypervisor switching (for the VMware-focused details, check out my VMware Networking Deep Dive). It starts with an overview of Open vSwitch (the open source alternative for VMware’s vSwitch, commonly used in Xen/KVM environments), describes the basics of hypervisor-based switching and addresses some of the performance myths. There’s also an interesting response from Intel setting straight the SR-IOV facts.

read more see 4 comments

Inter-DC IP-based vMotion with LISP

In early autumn of 2010, a “DRAFT on Cisco Nexus 1000V LISP Configuration Guide” appeared on CCO. It’s gone now (and unfortunately I haven’t saved a copy), but the possibilities made me really excited – with LISP in Nexus 1000V, we could do close-to-perfect vMotion over any IP infrastructure (including inter-DC vMotion that requires stretched VLANs and L2 DCI today). Here’s what I had to say on this topic during my Data Center Interconnect webinar (buy a recording).

see 7 comments

vCider: climbing the virtual networking mountain

You probably know the old saying – if the mountain doesn’t want to come to you, you have to go out there and climb it. vCider, a brand-new startup launching their product at Gigaom Structure Launchpad, decided to do something similar in the server virtualization (Infrastructure-as-a-Service; IaaS) space – its software allows IaaS customers to build their own virtual layer-2 networks (let’s call then vSubnets) on top of IaaS provider’s IP infrastructure; you can even build a vSubnets between VMs running within your enterprise network (private cloud in the cloudy lingo) and those running within Amazon EC2 or Rackspace.

Full disclosure: Chris Marino from vCider got in touch with me in early June. I found the idea interesting, he helped me understand their product (even offered a test run, but I chose to trust the technical information available on their web site and passed to me in e-mails and phone calls), and I decided to write about it. That’s it.

read more see 3 comments

Some more QoS basics

I got a really interesting question from one of my readers (slightly paraphrased):

Is this a correct statement: QoS on a WAN router will always be on if there are packets on the wire as the line is either 100% utilized or otherwise nothing is being transmitted. Comments like “QoS will kick in when there is congestion, but there is always congestion if the link is 100% utilized on a per moment basis” are confusing.

Well, QoS is more than just queuing. First you have to classify the packets; then you can perform any combination of marking, policing, shaping, queuing and dropping.

2011-06-23: Added description of various link efficiency mechanisms.

read more see 14 comments

Automatic edge VLAN provisioning with VM Tracer from Arista

One of the implications of Virtual Machine (VM) mobility (as implemented by VMware’s vMotion or Microsoft’s Live Migration) is the need to have the same VLAN configured on the access ports connected to the source and the target hypervisor hosts. EVB (802.1Qbg) provides a perfect solution, but it’s questionable when it will leave the dreamland domain. In the meantime, most environments have to deploy stretched VLANs ... or you might be able to use hypervisor-aware features of your edge switches, for example VM Tracer implemented in Arista EOS.

read more see 2 comments

Test your VMware networking skills

Two vSwitch portgroup-related questions:

  • Can you configure the same VLAN on two portgroups in the same vSwitch? How about vDS?
  • Can VMs attached to two different portgroups in the same ESX host talk to each other directly or do they have to go communicate through an external switch (or L3 device)?

Got your answers? Now click the Read more ... link.

read more see 6 comments

Blast from the past: ATM and POS interfaces

I got a question along these lines from a friend working in SP environment:

Customer wants to upgrade a 7200 with PA-A3-OC3SMI to ASR1001. Can they use ASR1001-2XOC3POS interfaces or are those different from “normal ATM interfaces”?

Both interfaces (PA-A3-OC3SMI for the 7200 and 2XOC3POS for the ASR1001) use SONET framing on layer 1, so you can connect them to the same SONET (layer-1) gear.

read more see 3 comments

DMVPN New Features webinar is cancelled

New DMVPN features in IOS release 15.x is obviously a topic without a broad audience ... although Cisco did introduce some nifty new things that can help you scale a large DMVPN network or make a DMVPN network more manageable.

Anyhow, I had to cancel next week’s webinar. I don’t plan to have another live session in a near future, but you can still buy a recording (or watch it as part of the yearly subscription).

Add comment

FCoE over TRILL ... this time from Juniper

A tweet from J Michel Metz has alerted me to a “Why TRILL won't work for data center network architecture” article by Anjan Venkatramani, Juniper’s VP of Product Management. Most of the long article could be condensed in two short sentences my readers are very familiar about: Bridging does not scale and TRILL does not solve the traffic trombone issues (hidden implication: QFabric will solve all your problems)... but the author couldn’t resist throwing “FCoE over TRILL” bone into the mix.

read more see 19 comments

QoS in large-scale DMVPN networks

Got this question a few days ago:

I have a large DMVPN network (~ 1000 sites) using variety of DSL, cable modem, and wireless connections. In all of these cases the bandwidth is extremely dissimilar and even varies with time. How can I handle this in a scalable way? Also, do you know of any product or facility that I can use to better measure the bandwidth from hub to spoke and better set the QOS values?

The last question is the easy part: one of the products that does that is NIL Monitor service where the remote probes can measure the actual end-to-end bandwidth. NIL Monitor software can also log into routers and change configurations if needed ... but what should you change?

read more see 14 comments
Sidebar