Your browser failed to load CSS style sheets. Your browser or web proxy might not support elliptic-curve TLS

Building network automation solutions

9 module online course

Start now!

Webinars in 2012

When I’m asking the yearly subscribers whether they’d like to renew their subscription, I promise them new content every 2-3 months (4-6 new sessions per year). 2012 was definitely a good year in that respect.

It started with the access network part of large-scale IPv6 design and deployment webinar, then there were two Data Center Fabrics update sessions (in May and November), scalability part of the cloud computing networking webinar, and a DMVPN design session.

read more see 1 comments

That’s it for 2012

12 months and ~210 blog posts later, it’s time for yet another “That’s It” blog post. Another exciting year has swooshed by, and I’d like to thank you all for the insightful comments you made, the great questions you asked, and the wonderful challenges you keep sending me.

If at all possible, now’s the time to start shutting down the pagers and smartphones, and enjoy the simpler (and less stressful) life with the loved ones. Have a great holiday season and all the best in the coming year! I’m going offline ... right now ;)

see 5 comments

Hyper-V Network Virtualization (HNV/NVGRE): Simply Amazing

In August 2011, when NVGRE draft appeared mere days after VXLAN was launched, I dismissed it as “more of the same, different encapsulation, vague control plane”. Boy was I wrong … and pleasantly surprised when I figured out one of the major virtualization vendors actually did the right thing.

TL;DR Summary: Hyper-V Network Virtualization is a layer-3 virtual networking solution with centralized (orchestration system based) control plane. Its scaling properties are thus way better than VXLAN’s (or Nicira’s … unless they implemented L3 forwarding since the last time we spoke).

read more see 38 comments

Do We Need FHRP (HSRP or VRRP) For IPv6?

Justin asked an interesting question in a comment to my IPv6 On-Link Determination post: do we need HSRP for IPv6 as the routers already send out RA messages? Pavel quickly pointed out that my friend @packetlife already wrote about it, concluding that you could use RAs unless you need deterministic sub-second failover.

However, there are (as always) a few more gotchas:

read more see 6 comments

Change in OSPF Designated Router creates extra network LSAs

When testing the OSPF graceful shutdown feature, I've encountered an interesting OSPF feature: if you force a change in LAN DR router (other than rebooting the current DR), you'll end up with two network LSAs describing the same LAN.

This blog has been sitting in my Draft folder for years, so Cisco IOS behavior might have changed in the meantime, or it might have been a transient and/or race condition. Nonetheless, I still find it interesting.

read more see 6 comments

Large Leaf-and-Spine Fabrics with Dell Force10 Switches Using 10GE Uplinks

The second scenario Brad Hedlund described in the Clos Fabrics Explained webinar is a large leaf-and-spine fabric using 10GE uplinks and QSFP+ breakout cables between leaf and spine switches (thus increasing the number of spine switches to 16).

Add comment

Secondary MPLS-TE Tunnels and Fast Reroute

Ronald sent me an interesting question: What's the point of having a secondary path set up for a certain LSP, when this LSP also has fast-reroute enabled (for example, with the Junos fast-reroute command)?

The idea of having a pre-established secondary LSP backing up a traffic engineering tunnel was commonly discussed before FRR was widely adopted, but should have quietly faded away by now.

read more see 8 comments

IPv6 Prefixes Longer Than /64 Might Be Harmful

A while ago I wrote a blog post about remote ND attacks, which included the idea of having /120 prefixes on server LANs. As it turns out, it was a bad idea, and as nosx pointed out in his comment: “there is quite a long list of caveats in all vendor camps regarding hardware in the last 6-8 years that has some potentially painful hardware issues regarding prefix length. Classic issues include ACL construction and TCAM specificity.

One would hope that the newly-release data center switches fare better. Fat chance!

read more see 13 comments

Stackable Data Center Switches? Do the Math!

Imagine you have a typical 2-tier data center network (because 3-tier is so last millennium): layer-2 top-of-rack switches redundantly connected to a pair of core switches running MLAG (to get around spanning tree limitations) and IP forwarding between VLANs.

Next thing you know, a rep from your favorite vendor comes along and says: “did you know you could connect all ToR switches into a virtual fabric and manage them as a single entity?” Is that a good idea?

read more see 12 comments

IPv6 On-Link Determination – What Is It And Why Do We Need It?

When an IPv4/IPv6 host wants to send a packet to another host, it has to answer the following simple questions:

  • Can I reach the destination IP address directly (is the destination on the same LAN/subnet)?
  • If not, who will help me forward the packet (who is the first-hop router)?

In IPv4 world, the host can get all the information it needs through DHCP. In IPv6 world, things are way more complex (but also way more correct if you’re a theoretician).

read more see 16 comments

EIGRP Loop Prevention Logic

Hamid sent me the following question:

I have already memorized (bad idea, BTW) that a loop can occur if FD < RD. Could you please tell me how a loop could occur assuming FD < RD and we ignore the feasibility condition.

I’ll use a simple three-router network (see the following diagram) to illustrate why EIGRP cannot figure out whether an alternate more expensive path could lead to a loop or not.

read more see 10 comments

VXLAN is not a Data Center Interconnect technology

In a comment to the Firewalls in a Small Private Cloud blog post I wrote “VXLAN is _NOT_ a viable inter-DC solution” and Jason wasn’t exactly happy with my blanket response. I hope Jason got a detailed answer in the VXLAN Technical Deep Dive webinar, here’s a somewhat shorter explanation.

read more see 11 comments

Building Leaf-and-Spine Fabrics with Dell Force10 Switches

In the Clos Fabrics Explained webinar I focused on the Clos fabrics principles of operation and design options, and Brad Hedlund who graciously agreed to be my guest explained how you can use Dell Force10 switches to build them. In this video he’s describing a simple leaf-and-spine topology with 40GE uplinks.

Add comment

IPv6 deployment IETF drafts

An incredible amount of IPv6 deployment documents has been published as IETF drafts recently, amongst them:

Enjoy ... and don’t forget to join the v6ops mailing list ;)

Add comment

What Exactly Are Virtual Firewalls?

Kaage added a great comment to my Virtual Firewall Taxonomy post:

And many of physical firewalls can be virtualized. One physical firewall can have multiple virtual firewalls inside. They all have their own routing table, rule base and management interface.

He’s absolutely right, but there’s a huge difference between security contexts (to use the ASA terminology) and firewalls running in VMs.

read more see 20 comments

BGP Convergence Optimization

I’m exposed to an incredible variety of topics in my ExpertExpress engagements, but there are always a few recurring themes, one of them being “we’re experiencing long convergence times and high packet loss after our primary Internet link fails.” Almost always the root cause turns out to be full Internet routing table being received on inadequate hardware.

see 2 comments

More real-life DHCPv6 Prefix Delegation gotchas

The murky details of IPv6 implementations never crop up till you start deploying it (or, as Randy Bush recently wrote: “it is cheering to see that the ipv6 ivory tower still stands despite years of attack by reality”).

Here’s another one: in theory the prefixes delegated through DHCPv6 should be static and permanently assigned to the customers for long periods of time.

read more see 9 comments

DHCPv6 Prefix Delegation, RADIUS and Shared Usernames

Jernej Horvat sent me the following question:

I know DHCPv6-based prefix delegation should be as stable as possible, so I plan to include the delegated prefix in my RADIUS database. However, for legacy reasons each username can have up to four concurrent PPPoE sessions. How will that work with DHCPv6 IA_PD?

Short answer: worst case, DHCPv6 prefix delegation will be royally broken.

read more see 4 comments

Firewalls in a Small Private Cloud

Mrs. Y, the network security princess, sent me an interesting design challenge:

We’re building a private cloud and I'm pushing for keeping east/west traffic inside the cloud. What are your opinions on the pros/cons of keeping east/west traffic in the cloud vs. letting it exit for security/routing?

Short answer: it depends.

read more see 11 comments

IP packet delivery confirmation

Thomas wanted to check whether the IP traffic is actually delivered to a remote site and sent me the following question:

I would like to know whether the packets I sent from site A to site B have been received. I don't want to create test traffic using ip sla, I would like to know that the production traffic has been delivered. I could use ACL counters but I'm running a full mesh of tens of sites. Ipanema does this very well, but I'm surprised that this doesn’t exist on Cisco IOS.

Short answer: that’s not how Internet works.

read more see 2 comments

Coping with Holiday Traffic – Secondary DHCP Subnets

Years ago our IT assigned a /28 to my home office. It seemed enough; after all, who would ever have more than ~10 IP hosts at home (or more than four computers at a site).

When the number of Linux hosts and iGadgets started to grow, I occasionally ran out of IPv4 addresses, but managed to kludge my way around the problem by reducing DHCP lease time. However, when the start of school holidays coincided with the first snow storm of the season (so all the kids used their gadgets simultaneously) it was time to act.

read more see 4 comments

VM-level IP Multicast over VXLAN

Dumlu Timuralp (@dumlutimuralp) sent me an excellent question:

I always get confused when thinking about IP multicast traffic over VXLAN tunnels. Since VXLAN already uses a Multicast Group for layer-2 flooding, I guess all VTEPs would have to receive the multicast traffic from a VM, as it appears as L2 multicast. Am I missing something?

Short answer: no, you’re absolutely right. IP multicast over VXLAN is clearly suboptimal.

read more see 3 comments

Beware of the pre-bestpath cost extended BGP community

One of my readers sent me an interesting problem a few days ago: the BGP process running on a PE-router in his MPLS/VPN network preferred an iBGP route received from another PE-router to a locally sourced (but otherwise identical) route. When I looked at the detailed printout, I spotted something “interesting” – the pre-bestpath cost extended BGP community.

read more Add comment

The Best of Last Week’s IPv6 Summit

Last week’s IPv6 summit organized by Jan Žorž was probably one of the best events to attend for engineers interested in real-life IPv6 deployment experience. Some of the highlights included:

Enjoy! ... and thank you, Jan, for an excellent event.

see 5 comments

Skip the Transitions, Build IPv6-Only Data Centers

During last week’s IPv6 Summit I presented an interesting idea first proposed by Tore Anderson: let’s skip all the transition steps and implement IPv6-only data centers.

You can view the presentation or watch the video; for more details (including the description of routing tricks to get this idea working with vanilla NAT64), watch Tore’s RIPE64 presentation.

see 4 comments

What was vCider all about?

After Cisco acquired vCider in early October the content quickly disappeared from vCider web site. If the bombastic industry press claims how vCider acquisition represents Cisco’s response to VMware’s Nicira acquisition, and how it will “boost Cisco’s distributed cloud vision” left you confused, read my blog post from June 2011 to review what vCider was all about. For a more balanced view, read also Omar Sultan’s blog post.

Finally, you might want to watch the description of vCider from my Cloud Computing Networking webinar.

see 1 comments

VXLAN Technical Deep Dive Webinar

VXLAN is quickly becoming a technology worth considering in production deployments – there are two independent hypervisor implementations (Nexus 1000V and vSphere 5.1 vDS), and other vendors announced support for VXLAN hardware termination (Arista on 7150 ToR switches, Brocade on ADX load balancers).

The upcoming VXLAN Technical Deep Dive webinar (register here) will give you a technology overview, some implementation and configuration hints, and (most importantly) design and deployment guidelines.

IP multicast in the core network is a major component of the design puzzle and I’m delighted Mark Berly from Arista Networks agreed to present Arista’s hardware termination solution and IP multicast design guidelines.

Add comment

Is Layer-3 DCI Safe?

One of my readers sent me a great question:

I agree with you that L2 DCI is like driving without a seat belt. But is L3 DCI safer in case of DCI link failure? Let's say you have your own AS and PI addresses in use. Your AS spans multiple sites and there are external BGP peers on each site. What happens if the L3 DCI breaks? How will that impact your services?

Simple answer: while L3 DCI is orders of magnitude safer than L2 DCI, it will eventually fail, and you have to plan for that.

read more see 3 comments

IPv6 First-Hop Security: Ideal OpenFlow Use Case

Supposedly it’s a good idea to be able to identify which one of your users had a particular IP address at the time when that source IP address created significant havoc. We have a definitive solution for the IPv4 world: DHCP server logs combined with DHCP snooping, IP source guard and dynamic ARP inspection. IPv6 world is a mess: read this e-mail message from v6ops mailing list and watch Eric Vyncke’s RIPE65 presentation for excruciating details.

read more see 2 comments

Dear $Vendor, NETCONF != SDN

Some vendors feeling the urge to SDN-wash their products claim that the ability to “program” them through NETCONF (or XMPP or whatever other similar mechanism) makes them SDN-blessed.

There might be a yet-to-be-discovered vendor out there that creatively uses NETCONF to change the device behavior in ways that cannot be achieved by CLI or GUI configuration, but most of them use NETCONF as a reliable Expect script.

read more see 3 comments

Disabling IP unreachables breaks pMTUd

A while ago someone sent me an interesting problem: the moment he enabled simple MPLS in his enterprise network with ip mpls interface configuration commands, numerous web applications stopped working. My first thought was “MTU problems” (the usual culprit), but path MTU discovery should have taken care of that.

read more see 3 comments

You MUST take control of IPv6 in your network

I’m positive most of you are way too busy dealing with operational issues to start thinking about IPv6 deployment (particularly if you’re working in the enterprise world; European service providers using the same “strategy” just got a rude wake-up call). Bad idea – if you ignore IPv6, it will eventually blow up in your face. Here’s how:

read more see 5 comments

The best of RIPE65

Last week I had the privilege of attending RIPE65, meeting a bunch of extremely bright SP engineers, and listening to a few fantastic presentations (full meeting report @ RIPE65 web site).

I knew Geoff Huston would have a great presentation, but his QoS presentation was even better than I expected. I don’t necessarily agree with everything he said, but every vendor peddling QoS should be forced to listen to his explanation of the underlying problems and kludgy solutions first.

read more see 1 comments

IPv6 security webinar

Not surprisingly, IPv6 has almost the same set of security problems as IPv4. Even worse, some of the things we’ve already solved in IPv4 (fragmented TCP/UDP headers) haven’t been ported to IPv6, and implementations of IPv6 security features lag far behind their IPv4 counterparts.

The upcoming IPv6 security webinar (register here) describes these problems, and I managed to get the best possible guest speaker: Eric Vyncke (the author of the IPv6 Security Cisco Press book) will tell you all about the IPv6 security features available in Cisco IOS.

see 3 comments

SDN Controller northbound API is the crucial missing piece

Imagine you’d like to write a simple Perl (or Python, Ruby, JavaScript – you get the idea) script to automate a burdensome function on your server (or router/switch from any vendor running Linux/BSD behind the scenes) that the vendor never bothered to implement. The script interpreter relies on numerous APIs being available from the operating system – from process API (to load and start the interpreter) to file system API, console I/O API, memory management API, and probably a few others.

Now imagine none of those APIs would be standardized (various mutually incompatible dialects of Tcl used by Cisco IOS come to mind) – that’s the situation we’re facing in the SDN land today.

read more see 8 comments

SDN, Career Choices and Magic Graphs

The current explosion of SDN hype (further fueled by recent VMworld announcement of Software-Defined Data Centers) made some networking engineers understandably nervous. This is the question I got from one of them:

I have 8 plus years in Cisco, have recently passed my CCIE RS theory, and was looking forward to complete the lab test when this SDN thing hit me hard. Do you suggest completing the CCIE lab looking at this new future of Networking?

Short answer: the sky is not falling, CCIE still makes sense, and IT will still need networking people.

read more see 14 comments

Cisco Nexus 3548: A Victory for Custom ASICs?

Autumn must be a perfect time for data center product launches: last week Brocade launched its core VDX switch and yesterday Arista and Cisco launched their new low-latency switches (yeah, the simultaneous launch must have been pure coincidence).

I had the opportunity to listen to Cisco’s and Arista’s product briefings, continuously experiencing a weird feeling of déjà vu. The two switches look like twin brothers … but there are some significant differences between the two:

read more see 23 comments

Arista launches the first hardware VXLAN termination device

Arista is launching a new product line today shrouded in mists of SDN and cloud buzzwords: the 7150 series top-of-rack switches. As expected, the switches offer up to 64 10GE ports with wire speed L2 and L3 forwarding and 400 nanosecond(!) latency.

Also expected from Arista: unexpected creativity. Instead of providing a 40GE port on the switch that can be split into four 10GE ports with a breakout cable (like everyone else is doing), these switches group four physical 10GE SFP+ ports into a native 40GE (not 4x10GE LAG) interface.

But wait, there’s more...

read more see 6 comments

State of IPv6 in the Data Center Gear

Just in case you haven’t noticed: RIPE region ran out of unallocated IPv4 addresses last Friday. RIPE members (regional registries) can get a single /22 each, enterprises that want to be IPv4-multihomed cannot get provider-independent addresses any more. It just might be time to start considering IPv6 in your data center. Let’s see whether the vendors agree with me.

read more see 10 comments

Data Center Fabric presentations available online

The Data Center Fabric presentations from EuroNOG 2011 and RIPE 64 meetings are available in my webinar demo web site. Video of the EuroNOG presentation is on YouTube and RIPE has its own video archives.

You’ll find more information in the Data Center Fabric Architectures webinar, including an update on individual vendors’ solution (MP4 videos from the update session have been uploaded in August).

Add comment

Building Large L3 Fabrics with Brocade VDX Switches

A few days ago the title of this post would be one of those “find the odd word out” puzzles. How can you build large L3 fabrics when you have to work with ToR switches with no L3 support, and you can’t connect more than 24 of them in a fabric? All that has changed with the announcement of VDX 8770 – a monster chassis switch – and new version of Brocade’s Network OS with layer-3 (IP) forwarding.

read more see 13 comments

Why is OpenFlow focused on L2-4?

Another great question I got from David Le Goff:

So far, SDN is relying or stressing mainly the L2-L3 network programmability (switches and routers). Why are most of the people not mentioning L4-L7 network services such as firewalls or ADCs. Why would those elements not have to be SDNed with an OpenFlow support for instance?

To understand the focus on L2/L3 switching, let’s go back a year and a half to the laws-of-physics-changing big bang event.

read more see 6 comments

QFabric Behind the Curtain: I was spot-on

A few days ago Kurt Bales and Cooper Lees gave me access to a test QFabric environment. I always wanted to know what was really going on behind the QFabric curtain and the moment Kurt mentioned he was able to see some of those details, I was totally hooked.

Short summary: QFabric works exactly as I’d predicted three months before the user-facing documentation became publicly available (the behind-the-scenes view described in this blog post is probably still hard to find).

read more see 19 comments

Dear VMware, BPDU Filter != BPDU Guard

A while ago I described the need for BPDU guard in hypervisor switches, and not surprisingly got a number of “it’s there” tweets seconds after vSphere 5.1 (which includes BPDU filter) was launched. Rickard Nobel also did a magnificent job of replicating the problem my blog post is describing and verifying vSphere 5.1 stops a BPDU denial-of-service attack.

Unfortunately, BPDU filter is not the same feature as BPDU guard. Here’s why.

read more see 7 comments

3 & 5 Years Ago (August 2012)

Most popular posts of August 2007 address everlasting issues: Tcl scripts with command-line parameters and DHCP conflict logging. There was also OSPF graceful shutdown and a continuation of the conditional OSPF default route story.

The focus of August 2009 were the what went wrong stories covering lack of session layer in TCP/IP, layering violations in socket API and SCTP.

read more Add comment

Midokura’s MidoNet: a Layer 2-4 virtual network solution

Almost everyone agrees the current way of implementing virtual networks with dumb hypervisor switches and top-of-rack kludges (including Edge Virtual Bridging – EVB or 802.1Qbg – and 802.1BR) doesn’t scale. Most people working in the field (with the notable exception of some hardware vendors busy protecting their turfs in the NVO3 IETF working group) also agree virtual networks running as applications on top of IP fabric are the only reasonable way to go ... but that’s all they currently agree upon.

read more see 23 comments

Is Layer-3 Switch More than a Router?

Very short answer: no.

You might think that layer-3 switches perform bridging and routing, while routers do only routing. That hasn’t been the case at least since Cisco introduced Integrated Routing and Bridging in IOS release 11.2 more than 15 years ago. However, Simon Gordon raised an interesting point in a tweet: “I thought IP L3 switching includes switching within subnet based on IP address, routing is between subnets only.”

Layer-3 switches and routers definitely have to perform some intra-subnet layer-3 functions, but they’re usually not performing any intra-subnet L3 forwarding.

read more see 15 comments

VXLAN and OTV: I’ve been suckered

When VXLAN came out a year ago, a lot of us looked at the packet format and wondered why Cisco and VMware decided to use UDP instead of more commonly used GRE. One explanation was evident: UDP port numbers give you more entropy that you can use in 5-tuple-based load balancing. The other explanation looked even more promising: VXLAN and OTV use very similar packet format, so the hardware already doing OTV encapsulation (Nexus 7000) could be used to do VXLAN termination. Boy have we been suckered.

Update 2015-07-12: NX-OS 7.2.0 supports OTV encapsulation with VXLAN-like headers on F3 linecards. See OTV UDP Encapsulation for more details (HT: Nik Geyer).

read more see 5 comments

Layer-2 DCI and the infinite wisdom of acmqueue

Yesterday I got pulled into a layer-2 DCI tweetfest. Not surprisingly, there were profound opinions all over the place, including “We've been doing it (OTV) for almost a year now. No problems.

OTV is in fact the least horrible option – it does quite a few things right, including tight control of unicast flooding and reduction of STP scope.

Today I stumbled across this gem in the acmqueue blogs:

You might as well ask why people insist on not wearing seatbelts after all of the years that particular technology has been proven to save lives.

People will, it seems, persist in the optimistic belief that everything will be OK so long as they are otherwise careful. They think that bad things happen only to other people’s protocols, or packets, but not to theirs. Hope springs eternal and dies in the cold, cold winter of experience.

Finding this one a day after discussing layer-2 DCI? There really are no coincidences.

see 1 comments

802.1BR – same old, same old

A while ago, a tweet praising the wonders of 802.1BR piqued my curiosity. I couldn’t resist downloading the latest draft and spending a few hours trying to decipher IEEE language (as far as the IEEE drafts go, 802.1BR is highly readable) ... and it was déjà vu all over again.

Short summary: 802.1BR is repackaged and enhanced 802.1Qbh (or the standardized version of VM-FEX). There’s nothing fundamentally new that would have excited me.

read more see 7 comments

PVLAN, VXLAN and cloud application architectures

Aldrin Isaac made a great comment to my Could MPLS-over-IP replace VXLAN? article:

As far as I understand, VXLAN, NVGRE and any tunneling protocol that use global ID in the data plane cannot support PVLAN functionality.

He’s absolutely right, but you shouldn’t try to shoehorn VXLAN into existing deployment models. To understand why that doesn’t make sense, we have to focus on the typical cloud application architectures.

read more Add comment

Why is RESTful API better than SNMP?

Brian Christopher Raaen asked a great question in a comment to my OpenStack/Quantum SDN-Based Virtual Networks post:

Other than some syntax difference what do these new HTTP-based APIs add that SNMP couldn't already do?

Short answer: nothing, apart from familiarity and convenient programming libraries.

read more see 31 comments

OpenFlow and Ipsilon: Nothing New Under the Sun

I’d promised to record another MPLS-related podcast and wanted to refresh my failing memory and revisit the beginnings of Tag Switching (Cisco’s proprietary technology that was used as the basis for MPLS). Several companies were trying to solve the IP+ATM integration problem in mid-nineties, most of them using IP-based architectures (Cisco, IBM, 3Com), while Ipsilon tried its luck with a flow-based solutions.

read more see 3 comments

IRS – just what the SDN Goldilocks is looking for?

Most current SDNish tools are too cumbersome for everyday use: OpenFlow is too granular (the controller interacts directly with the FIB or TCAM), and NETCONF is too coarse (it works on the device configuration level and thus cannot be used to implement anything the networking device can’t already do). In many cases, we’d like an external application to interact with the device’s routing table or routing protocols (similar to tracked static routes available in Cisco IOS, but without the configuration hassle).

read more see 2 comments

Do you need IPsec to run IPv6?

The usual claim that “IPv6 has better security because it includes mandatory IPsec support” is evidently creating some confusion, at least based on a set of questions I received from one of my readers.

Can IPv6 work without IPsec?

Absolutely. Most IPv6 deployments don’t use IPsec (unless you’re building IPsec-based VPNs over IPv6 transport infrastructure).

read more see 6 comments

DMCA Works

A week ago a friend sent me a disturbing email: another creative individual has decided to use my content to attract traffic, this time making sure to remove all the links and even the webinar introductions before republishing it. As expected, the well-populated web site had no about or contact me links, and the domain name registrant was an obvious fake.

read more see 5 comments

EIGRP: an MBA-like perspective

Ahmed was reading my EIGRP book (I know it’s hard to get, but fortunately he found a well-marked copy) and wanted to check his understanding of how EIGRP works. The first question was as good a summary as I’ve ever seen:

Does it just simply boil down to the fact that a router will choose not to have anything to do with a reported distance higher than its own cost to that route (feasible distance) for the (paranoid) fear that it could be a loop?

Next, he started wondering why a router would behave that way:

read more see 10 comments

OpenStack/Quantum SDN-based virtual networks with Floodlight

A few years before MPLS/VPN was invented, I’d worked with a service provider who wanted to offer L3-based (peer-to-peer) VPN service to their clients. Having a single forwarding table in the PE-routers, they had to be very creative and used ACLs to provide customer isolation (you’ll find more details in the Shared-router Approach to Peer-to-peer VPN Model section of my MPLS/VPN Architectures book).

Now, what does that have to do with OpenFlow, SDN, Floodlight and Quantum?

read more see 5 comments

Is it safe to run Internet in a VRF?

During the February Packet Party someone asked the evergreen question: “Is it safe to run Internet services in a VRF?” and my off-the-cuff answer was (as always) “Doing that will definitely consume more memory than having the Internet routes in the global routing table.” After a few moments Derick Winkworth looked into one of his routers and confirmed the difference is huge ... but then he has a very special setup, so I decided to do a somewhat controlled test.

read more see 17 comments

Virtualized Squashed Complexity Sausage

Straight from RFC 6670 (section 3.4):

[...] as is usually the case with communications technologies, simplification in one element of the system introduces an increase (possibly a non-linear one) in complexity elsewhere. This creates the "squashed sausage" effect, where reduction in complexity at one place leads to significant increase in complexity at a remote location.

This is probably the most concise description of the great idea of using long-distance vMotion for “mission-critical” craplications, and applies equally well to the kludges used to compensate the simplicity of virtual switches.

Add comment

3 & 5 Years Ago (June 2012)

June 2007 was the month of defaults, from default interface configurations to OSPF default routes and DHCP-generated default routes.

One of the posts I wrote in June 2009 described a concept that matters only if you’re studying for the CCIE exam: EIGRP load and reliability metrics. Other more earthly topics included ADSL reference diagram and ADSL QoS basics.

read more Add comment

VMware buys Nicira: a Hypervisor Vendor Woke Up

Almost a year ago, I predicted that eventually the hypervisor vendors will wake up and realize it’s time to get rid of VLANs and decouple virtual networks from the physical world. We’ve got the first glimpse of the brave new world a few weeks after that post was published with the VXLAN launch, but that was still a Cisco’s solution running on top of VMware’s (and now everyone else’s) hypervisor. The recent VMware’s acquisition of Nicira proves that VMware finally woke up big time.

read more see 10 comments

The Difference between Metro Ethernet and Stretched Data Center Subnets

Every time I rant about large-scale bridging and stretched L2 subnets, someone inevitably points out that Carrier (or Metro) Ethernet works perfectly fine using the same technologies and principles.

I won’t spend any time on the “perfectly fine” part (Greg Ferro had a lot to say about that in the early Packet Pushers podcasts), but focus on the fundamental difference between the two: the use case.

read more see 6 comments

BGP route replication in MPLS/VPN PE-routers

Whenever I’m explaining MPLS/VPN technology, I recommend using the same route targets (RT) and route distinguishers (RD) in all VRFs belonging to the same simple VPN. The Single RD per VPN recommendation doesn’t work well for multi-homed sites, so one might wonder whether it would be better to use a different RD in every VRF. The RD-per-VRF design also works, but results in significantly increased memory usage on PE-routers.

2012-07-24: Updated the conclusions based on feedback from nosx

read more see 6 comments

Long-Distance Workload Mobility in Perspective

In a recent blog post, Chuck Hollis described how some of EMC customers use long-distance workload mobility. Not surprisingly, he focused on the VPLEX Metro part of the solution and didn’t even mention the earth-flattening requirements this idea imposes on the network. I guess you already know my views on that topic, but regardless of my personal opinions, he got me curious.

read more see 4 comments

Analyst-driven IPv6 deployment

Straight from the rumor mill (source, translated):

One of German ISPs is actually quite busy rolling out IPv6 after their CFO got a call from a stock analyst right during the RIPE meeting, asking questions “so what are your IPv6 plans?” – “none, what is IPv6?” – “oh, this is not so good”… full panic down the management chain…

Proves the everlasting wisdom from Martin Levy (source, the rest of article is not worth reading):

You can either do a planned, careful migration, or you can do it in a panic. And you should know full well that panicking is more expensive.

Just in case you’ll be pushed into the panic mode: my webinars include intro for enterprises, intro for service providers and in-depth design/deployment webinar.

see 3 comments

Can I download the webinar recordings?

I get this question every second week or so – someone would like to buy the yearly subscription and wonders whether she’ll be able to watch the recordings on her iPad.

Short answer: Yes for most webinars.

Update 2012-07-13: All webinars recorded prior to July 1st 2012 are available in ARF format. Many of them are also available in edited MP4 format.

read more see 2 comments

Why Do Internet Exchanges Need Layer-2?

My tweet about the latest proof of my layer-2 = single failure domain claim has raised numerous questions about the use of bridging (aka switching) within Internet Exchange Points (IXP). Let’s see why most IXPs use L2 switching and why L2 switching is the simplest solution to the problem they’re solving.

read more see 26 comments

Does CPU-based forwarding performance matter for SDN?

David Le Goff sent me several great SDN-related questions. Here’s the first one:

What is your take on the performance issue with software-based equipment when dealing with general purpose CPU only? Do you see this challenge as a hard stop to SDN business?

Short answer (as always) is it depends. However, I think most people approach this issue the wrong way.

read more see 4 comments

Legacy Protocols in OpenFlow-Based Networks

This post is probably a bit premature, but I’m positive your CIO will get a visit from a vendor offering clean-slate OpenFlow/SDN-based data center fabrics in not so distant future. At that moment, one of the first questions you should ask is “how well does your new wonderland integrate with my existing network?” or more specifically “which L2 and L3 protocols do you support?

read more see 15 comments

Could MPLS-over-IP replace VXLAN or NVGRE?

A lot of engineers are concerned with what seems to be frivolous creation of new encapsulation formats supporting virtual networks. While STT makes technical sense (it allows soft switches to use existing NIC TCP offload functionality), it’s harder to figure out the benefits of VXLAN and NVGRE. Scott Lowe wrote a great blog post recently where he asked a very valid question: “Couldn’t we use MPLS over GRE or IP?” We could, but we wouldn’t gain anything by doing that.

read more see 18 comments

We need both OpenFlow and NETCONF

Every time I write about a simple use case that could benefit from OpenFlow, I invariably get a comment along the lines of “you can do that with NETCONF”. Repeated often enough, such comments might make an outside observer believe you don’t need OpenFlow for Software Defined Networking (SDN), which is simply not true. Here are at least three fundamental reasons why that’s the case.

read more see 3 comments

NETCONF = Expect on steroids

After the initial explosion of OpenFlow/SDN hype, a number of people made claims that OpenFlow is not the tool one can use to make SDN work, and NETCONF is commonly mentioned as an alternative (not surprisingly, considering that both Cisco IOS and Junos support it). Unfortunately, considering today’s state of NETCONF, nothing can be further from the truth.

read more see 8 comments

Does TRILL make sense at all?

It’s clear that major hypervisor vendors consider MAC-over-IP to be the endgame for virtual networking; they’re still squabbling about the best technology and proper positioning of bits in various headers, but the big picture is crystal-clear. Once they get there (solving “a few” not-so-trivial problems on the way), and persuade everyone to use virtual appliances, the network will have to provide seamless IP transport, nothing more.

At that moment, large-scale bridging will finally become a history (until the big layer pendulum swings again) and one has to wonder whether there’s any data center future for TRILL, SPB, FabricPath and other vendor-specific derivatives.

read more see 17 comments

BGP operations and security, second draft

Jerome has just published the second version of our BGP operations and security Internet draft. Most of the typos and obvious blunders have been fixed (or so we hope) and we’ve incorporated numerous comments received online or during the Paris IETF meeting. Feedback is (as always) highly welcome.

The latest draft is available here.

Add comment

Hybrid OpenFlow, the Brocade Way

A few days after Brocade unveiled its SDN/OpenFlow strategy, Katie Bromley organized a phone call with Keith Stewart who kindly explained to me some of the background behind their current OpenFlow support. Apart from the fact that it runs on the 100GE adapters, the most interesting part is their twist on the hybrid OpenFlow deployment.

read more see 5 comments

Cisco ONE: More than just OpenFlow/SDN

As expected, Cisco launched its programmable networks strategy (Cisco Open Networking Environment – ONE) at Cisco Live US ... and as we all hoped, it was more than just OpenFlow support on Nexus 3000. It was also totally different from the usual we support OpenFlow on our gear me-too announcements we’ve seen in the last few months.

read more see 14 comments

Big Switch and Overlay Networks

A few days ago Big Switch announced they’ll support overlay networks in their upcoming software release. After a brief “told you so” moment (because virtual networks in physical devices don’t scale all that well) I started wondering whether they simply gave up and decided to become a Nicira copycat, so I was more than keen to have a brief chat with Kyle Forster (graciously offered by Isabelle Guis).

read more see 1 comments

QFabric Lite

QFabric from Juniper is probably the best data center fabric architecture (not implementation) I’ve seen so far – single management plane, implemented in redundant controllers, and distributed control plane. The “only” problem it had was that it was way too big for data centers that most of us are building (how many times do you need 6000 10GE ports?). Juniper just solved that problem with a scaled-down version of QFabric, officially named QFX3000-M.

read more see 10 comments

OpenFlow/SDN is not a silver bullet

Last autumn Todd Hoff (the author of the fantastic High Scalability blog) asked me to write a short article explaining the scalability challenges SDN and OpenFlow in particular might be facing. It took me “a while”, but I finally got it done – the OpenFlow/SDN Is Not a Silver Bullet for Network Scalability article was published last Monday.

see 4 comments

Choose your networking equipment with RIPE-554

In case the industry press hasn’t told you yet, tomorrow is the World IPv6 Launch day. While the obstinate naysayers will still claim IPv6 doesn’t matter (but then there are people believing in flat Earth being ~6000 years old and riding on a stack of turtles), the rest of us should be prepared to enable IPv6 when needed … and it all starts with the networking equipment that supports IPv6 and has IPv6 performance that has at least the same order of magnitude as the IPv4 performance.

read more see 7 comments

Equal-Cost Multipath in Brocade’s VCS Fabric

Understanding equal-cost multipathing in Brocade’s VCS Fabric is a bit tricky, not because it would be a complex topic, but because it’s a bit counter-intuitive (while still being perfectly logical once you understand it). Michael Schipp tried to explain how it works, Joel Knight went even deeper, and I’ll try to draw a parallel with the routed networks because most of us understand them better than the brave new fabric worlds.

read more see 15 comments

ARP reply with multicast sender MAC address is indeed illegal

A while ago I was writing about the behavior of Microsoft’s Network Load Balancing, the problems it’s causing and how Microsoft tried to hack around them using multicast MAC addresses as the hardware address of sender in ARP replies (which is illegal). A few days ago one of my readers asked me whether I know which RFC prohibits the use of multicast MAC address in ARP replies.

A quick consultation with friendly Google search engine returned this web page, which contained the answer: section 3.3.2 of RFC 1812 (Requirements for IP Version 4 Routers):

A router MUST not believe any ARP reply that claims that the Link Layer address of another host or router is a broadcast or multicast address.

Problem solved – now I know the real reason we have to configure static ARP entries on Cisco routers and switches.

see 12 comments

Layer-2 Network Is a Single Failure Domain

This topic has been on my to-write list for over a year and its working title was phrased as a question, but all the horror stories you’ve shared with me over the last year or so (some of them published in my blog) have persuaded me that there’s no question – it’s a fact.

If you think I’m rephrasing the same topic ad nauseam, you’re right, but every month or so I get an external trigger that pushes me back to the same discussion, this time an interesting comment thread on Massimo Re Ferre’s blog.

read more see 27 comments

IPv6-only Data Center (built by Tore Anderson)

A while ago I wrote about uselessness of stateless NAT64 and got in nice discussion with Tore Anderson who wanted to use stateless NAT64 in reverse direction (stateless NAT46) to build an IPv6-only data center. Some background information first (to define the context of his thinking before we jump into the technical details):

read more see 5 comments

Goodbye Echo, I’ll miss you!

Some of you have noticed that I’d changed the commenting system on my blog recently. Here’s the full story (with a question for you at the very end).

I was totally fed up with Blogger comments years ago and decided to look for an alternative. JS-Kit was a perfect solution and it even allowed me to import Blogger comments and synchronize new entries with Blogger (so I could turn it off at any time and retain my comments).

read more see 10 comments

HTTP-over-IPv6 on Cisco IOS

Stumbled across this marvel while updating my IPv6 presentations for a 2-day seminar in Milano and Rome (straight from 15.2M&T command reference):

With IPv6 support added in Cisco IOS Release 12.2(2)T, the ip http server command simultaneously enables and disables both IP and IPv6 access to the HTTP server. However, an access list configured with the ip http access-class command will only be applied to IPv4 traffic. IPv6 traffic filtering is not supported.

Wait ... WHAT? I cannot control who can access the HTTP(S) server running in Cisco IOS over IPv6 (apart from kludges like ingress ACLs on all interfaces or CoPP), and this stupidity has been left unfixed for nine(9) years?. Are we really in 2012, less than a month away from World IPv6 Launch or have I been transported to 1990’s?

see 13 comments

OpenFlow @ Google: Brilliant, but not revolutionary

Google unveiled some details of its new internal network at Open Networking Summit in April and predictably the industry press and OpenFlow pundits exploded with the “this is the end of the networking as we know it” glee. Unfortunately I haven’t seen a single serious technical analysis of what it is they’re actually doing and how different their new network is from what we have today.

read more see 21 comments

Are Fixed Switches More Efficient Than Chassis Ones?

Brad Hedlund did an excellent analysis of fixed versus chassis-based switches in his Interop presentation and concluded that fixed switches offer higher port density and lower per-port power consumption than chassis-based ones. That’s true when comparing individual products, but let’s ask a different question: how much does it take to implement a 384-port non-blocking fabric (equivalent to Arista’s 7508 switch) with fixed switches?

read more see 8 comments

Virtual Networks: the Skype Analogy

I usually use the “Nicira is Skype of virtual networking” analogy when describing the differences between Nicira’s NVP and traditional VLAN-based implementations. Cade Metz liked it so much he used it in his What Is a Virtual Network? It’s Not What You Think It Is article, so I guess a blog post is long overdue.

Before going into more details, you might want to browse through my Cloud Networking Scalability presentation (or watch its recording) – the crucial slide is this one:

read more Add comment

Transparent Bridging (aka L2 Switching) Scalability Issues

Stephen Hauser sent me an interesting question after the Data Center fabric webinar I did with Abner Germanow from Juniper:

A common theme in your talks is that L2 does not scale. Do you mean that Transparent (Learning) Bridging does not scale due to its flooding? Or is there something else that does not scale?

As is oft the case, I’m not precise enough in my statements, so let’s fix that first:

read more see 2 comments

Brocade VCS Fabric

Just prior to Networking Field Day, the merry band of geeks sat down with Chip Copper, Brocade’s Solutioneer (a job title almost as good as Packet Herder) to discuss the intricate details of VCS Fabric. The videos are well worth watching – the technical details are interesting, but above all, Chip is a fantastic storyteller.

read more see 4 comments

NHRP Rate Limiting can hurt your DMVPN network

NHRP-based interface state control is a fantastic feature that you can use for faster convergence of very large DMVPN networks (as explained in the DMVPN Designs webinar, you can also use it to solve some interesting backup scenarios). We tested it in a network with over 1000 spokes (using ASR1K as the hub router) using very short registration timeouts, and the CPU utilization of the NHRP process rarely exceeded a few percents.

read more Add comment

Does Optimal L3 Forwarding Matter in Data Centers?

Every data center network has a mixture of bridging (layer-2 or MAC-based forwarding, aka switching) and routing (layer-3 or IP-based forwarding); the exact mix, the size of L2 domains, and the position of L2/L3 boundary depend heavily on the workload ... and I would really like to understand what works for you in your data center, so please leave as much feedback as you can in the comments.

read more see 18 comments

Best of March 2012

The most popular post in March was the one describing my BGP security Internet draft. That’s good news – let’s hope you’ll all implement the recommended security measures. And here’s the top-10 list as reported by Google Analytics.

Add comment

Interesting OpenFlow links (2012-04-21)

The blogosphere has been full of OpenFlow-related articles recently (no wonder - there was Open Networking Summit in Santa Clara), so here's a special OpenFlow edition of interesting links

Let's start with my good friend Greg Ferro. I'm so glad to see him returning back from a sabbatical at OpenFlow Kool-Aid lake. His latest articles are a must-read: OpenFlow might lower CapEx while SDN will increase OpEx and OpenFlow doesn’t undermine Vendors even though it changes everything. We're perfectly aligned, which will make our discussions way less interesting, but I'm glad I'm not the only conservative in the town.

read more Add comment

Cloud Computing Networking Presentations Available Online

I’ve published two cloud networking-related presentations to my webinar demo web site: the Cloud Computing Networking Under the Hood presentation from EuroNOG 2011 briefly describes various technologies you can use to implement virtual networks in IaaS clouds, the Cloud Networking Scalability one from RIPE64 addresses the scalability aspects of these technologies.

You can find a broader in-depth description of these topics in the Cloud Computing Networking webinar (register for the next week’s live session).

see 5 comments

Virtual Networking is more than VMs and VLAN duct tape

VMware has a fantastic-looking cloud provisioning tool – vCloud director. It allows cloud tenants to deploy their VMs and create new virtual networks with a click of a mouse (the underlying network has to provide a range of VLANs, or you could use VXLAN or vCDNI to implement the virtual segments).

Needless to say, when engineers not familiar with the networking intricacies create point-and-click application stacks without firewalls and load balancers, you get some interesting designs.

read more see 8 comments

Best of February 2012

Google Analytics claims blog posts describing Nicira were among the most popular content written in February 2011. No surprise there. Here’s the whole top-10 list:

see 2 comments

LineRate Proxy: Software L4-7 Appliance With a Twist

Buying a new networking appliance (be it VPN concentrator, firewall or load balancer … aka Application Delivery Controller) is a royal pain. You never know how much performance you’ll need in two or three years (and your favorite bean counter will not allow you to scrap it in less than 4-5 years). You do know you’ll never get the performance promised in vendor’s data sheets … but you don’t always know which combination of features will kill the box.

Now, imagine someone offers you a performance guarantee – you’ll always get what you paid for. That’s what LineRate Systems, a startup just exiting stealth mode is promising.

read more see 17 comments

Full mesh is the worst possible fabric architecture

One of the answers you get from some of the vendors selling you data center fabrics is “you can use any topology you wish” and then they start to rattle off an impressive list of buzzword-bingo-winning terms like full mesh, hypercube and Clos fabric. While full mesh sounds like a great idea (after all, what could possibly go wrong if every switch can talk directly to any other switch), it’s actually the worst possible architecture (apart from the fully randomized Monkey Design).

Before reading the rest of this post, you might want to visit Derick Winkworth’s The Sad State of Data Center Networking to get in the proper mood.

read more see 12 comments

IPv6 presentations published on my demo site

The system that allows me to publish slide decks on my demo site is finally ready for beta testing ... you’re most welcome to try it with different browsers and tell me how many things are broken.

I’ve already published my IPv6 presentations from Slovenian IPv6 summits. Enjoy them (but do remember that some of them are more than 2 years old).

see 3 comments

vCider: A Hammer Looking For a Nail?

Last week Juergen Brendel published an interesting blog post describing how you can use vCider to implement high-availability clusters with multi cloud strategy, triggering the following response from one of my readers: “I hadn't heard of vCider before but seeing stuff like this always makes me doubt my sanity – is there really a situation where the only solution is multi-site L2?

read more see 4 comments

Beware of fabric-wide Link Aggregation Groups

Fernando made a very valid comment to my Monkey Design Still Doesn’t Work Well post: if we would add a few more links between edge and core (fabric) switches to that network, we might get optimal bandwidth utilization in the core. As it turns out, that’s not the case.

read more Add comment

Monkey Design Still Doesn’t Work Well

We’ve seen several interesting data center fabric solutions during the Networking Tech Field Day presentations, every time hearing how the new fabric technologies (actually, the shortest path bridging part of those technologies) allow us to shed the yoke of the Spanning Tree monster (see Understanding Switch Fabrics by Brandon Carroll for more details). Not surprisingly we wanted to know more and asked the obvious question: “and how would you connect the switches within the fabric?”

read more see 13 comments

Interesting data center links (2012-04-09)

It's been a while since I published the interesting links; there were so many of them in my Evernote notebook that I had to publish the data center ones separately.

Let's start with Network, Interrupted, which is a fantastic summary of where the network might be in a few years by Derick Winkworth. Then there's You Can’t Build A System In A Silo, a fantastic summary of what needs to be done to reorganize your IT by Ethan Banks. And here are all the other interesting links in somewhat random order:

read more Add comment

April: The Month of Fabrics and Clouds

If you’re attending the upcoming RIPE64 meeting in Ljubljana, don’t be late for my presentations (pretty early on Tuesday, April 17th) on Cloud Computing networking and Data Center fabrics. Later in the same week (April 19th), I’ll be presenting in the Server Guy’s Guide to Network Fabrics – a free webinar sponsored by Juniper. Finally, there’s the Cloud Computing Networking – Will It Scale? webinar in the last week of April.

Get your personal cloud fabric @
Add comment

Networking Tech Field Day #3: First Impressions

Last week Stephen Foskett and Greg Ferro brought back their merry crew of geeks (and a network security princess) for the third Networking Tech Field Day. We’ve met some exciting new vendors (Infineta and Spirent) and a few long-time friends (Arista, Cisco, NEC and Solarwinds).

Infineta gave us a fantastic deep-dive into deduplication math, and Spirent blew our socks off with their testing gear. As for the generic state of the networking industry, the “I’m excited” rating from last autumn changed to this (HT @reillyusa):

read more see 1 comments

IPv6 Legends and Myths: More Opinions than Data Points

Trevor Pott wrote an interesting article in The Register (linking to my IPv6 multihoming post – thank you!) explaining how, in his opinion, IPv6 sucks for small and medium businesses. I wholeheartedly agree with some of his conclusions (actually, agreed with them for the last three years), but unfortunately the article contains several factual errors that simply have to be corrected (I doubt many of Trevor’s readers will actually find their way to this article, but one can always hope).

read more see 17 comments

Cisco & VMware: Merging the Virtual and Physical NICs

Virtual (soft) switches present in almost every hypervisor significantly reduce the performance of high-bandwidth virtual machines (measurements done by Cisco a while ago indicate you could get up to 38% more throughput if you tie VMs directly to hardware NICs), but as I argued in my “Soft Switching Might Not Scale, But We Need It” post, we need hypervisor switches to isolate the virtual machines from the vagaries of the physical NICs.

Engineering gurus from Cisco and VMware have yet again proven me wrong – you can combine VMDirectPath and vMotion if you use VM-FEX.

read more see 13 comments

Cloud Services Taxonomy

One of the challenges of designing data center networks that support cloud service is agreeing on what exactly each one of those services should be doing. This video (part of the Cloud Computing Networking webinar) explains what various categories of cloud services actually do and where they could be used in a typical web application stack.

Add comment

Migrating from Phase 1 DMVPN to Phase 2/3 network

Chris sent me an interesting question that I haven’t covered in any of my DMVPN webinars: “How would you migrate a part of a Phase-1 DMVPN network to a Phase-2 or Phase-3 network if you can only migrate one spoke site at a time? Can I just upgrade the spokes that need spoke-to-spoke connectivity?”

While it might be theoretically possible to have a mixed Phase-1/Phase-2 DMVPN tunnel (and I just might be able to get it to work in a lab), such a solution definitely violates the KISS principle.

read more Add comment