Building network automation solutions

9 module online course

Start now!

Blog Posts in January 2015

Case Study: Combine Physical and Virtual Appliances in a Private Cloud

Cloud builders are often using my ExpertExpress service to validate their designs. Tenant onboarding into a multi-tenant (private or public) cloud infrastructure is a common problem, and tenants frequently want to retain the existing network services appliances (firewalls and load balancers).

The Combine Physical and Virtual Appliances in a Private Cloud case study describes a typical solution that combines per-tenant virtual appliances with frontend physical appliances.

add comment

Video: IPv6 High Availability Components

Last spring I ran an IPv6 High Availability webinar which started (not surprisingly) with a simple question: “which network components affect availability in IPv6 world, and how is a dual-stack or an IPv6-only environment different from what we had in the IPv4 world?

This part of the webinar is now available with Free Subscription. Enjoy the video, and don't forget to explore other IPv6 resources on ipSpace net.

add comment

IPv6 Renumbering – Mission Impossible?

In one of the discussions on v6ops mailing list Matthew Petach wrote:

The probability of us figuring out how to scale the routing table to handle 40 billion prefixes is orders of magnitude more likely than solving the headaches associated with dynamic host renumbering. That ship has done gone and sailed, hit the proverbial iceberg, and is gathering barnacles at the bottom of the ocean.

Is it really that bad? Is simple renumbering in IPv6 world just another myth? It depends.

read more see 2 comments

Improving ECMP Load Balancing with Flowlets

Every time I write about unequal traffic distribution across a link aggregation group (LAG, aka Etherchannel or Port Channel) or ECMP fabric, someone asks a simple question “is there no way to reshuffle the traffic to make it more balanced?

TL&DR summary: there are ways to do it, and some vendors already implemented them.

The Problem

The algorithm that spreads the traffic across a group of outbound links (LAG or set of ECMP next hops) has to satisfy a few requirements:

  • It has to work reasonably well in typical environments;
  • It should not reorder packets of the same flow (here’s why);
  • It has to be simple enough to be implementable in reasonably cheap ASICs;

The second and third requirement result in what the chipset manufacturers (and subsequently the hardware vendors) are offering today: hash-based distribution of packets. In case you need a step-by-step overview of this process, here’s how it works:

  • Create an array of buckets and assign each outgoing link to one or more buckets. The bucket size is the number you see in marketing papers as “we support N-way ECMP” or “we have N-way LAG”.
  • Take N fields from the outgoing packet header. The fields could be MAC addresses (source and/or destination), IP addresses (source and/or destination), IP port numbers, or even some other fixed-position fields in the packet header. Some vendors – for example Arista – allow you to configure which fields you want to use (assuming the platform chipset supports this functionality).
  • Hash the fields from the packet header to get an integer between 0 and bucket size – 1. Example: for bucket sizes that are power of two take the low-order N bits of the hash.
  • Enqueue the packet into the output queue of the interface that is associated with the bucket selected by the packet hash.

Have you noticed that the algorithm never checks the size of the output queue? If the hashing algorithm decides to send the packet through Interface#1, the switch will send the packet through Interface#1 even though that interface might be dropping packets like crazy due to continuous congestion, and all the other interfaces sit idle.

The reason the load-balancing algorithm never checks the load on the outbound interface is simple: the typical environment mentioned above is usually assumed to be a healthy mix of numerous independent mice flows. Throw a few elephants in the mix and the assumptions start breaking down.

The only vendor that was always able to cope with the elephants in the mix is Brocade due to the fact that their traditional typical environment (storage networks) consists mainly of elephants.

Can We Solve the Problem?

Here’s an intriguingly simple question: Why can’t we change the mix of outgoing interfaces in the N-way ECMP table to reflect the actual interface load? Wouldn’t that allow us to push the mice flows away from elephants crowding some of the interfaces?

In principle, the answer is “Sure, we could do that”, but we have to solve three challenges:

  • Coarse-grained reshuffling could make matters worse. If your hardware supports 8-way ECMP and you have four uplinks, you might shift a large proportion of the traffic when you reassign the buckets to less-loaded interfaces, resulting in a nasty oscillation. Modern chipsets support at least 256-way ECMP, so that shouldn’t be a problem.
  • The hardware you use has to support per-bucket counters. All hardware supports per-interface counters, but while they help you identify the congested interfaces, the won’t help you reshuffle the traffic – if the control-plane software cannot see how much traffic goes through each bucket, it makes no sense to randomly reshuffle the buckets hoping for the best.
  • We shall not reorder the packets (at least within the data center), which means that we cannot reshuffle active buckets, but it’s relatively safe to change the outgoing interface of a currently inactive bucket. You could still reorder packets within a TCP session under an unlikely set of circumstances (figuring out what those circumstances are is left as an exercise for the reader), but we just might have to accept that slight risk of temporary performance degradation if we want to get better link utilization.

Would the reshuffle inactive buckets idea work in practice? Are there inactive buckets in a typical high-volume data center environment? Welcome to the weird world of flowlets.

What Are Flowlets?

It seems the idea of flowlets first appeared in the Harnessing TCP’s Burstiness with Flowlet Switching paper (see also corresponding PPT) – due to the bursty nature of TCP, you might be able to do pretty reliable bucket reshuffling with 256 or more buckets, as some buckets always tend to be empty.

Microsoft started using flowlets in Windows Server 2012 R2, and recently Cisco implemented flowlet-based dynamic load balancing in the ACI leaf-and-spine fabrics. Juniper is doing something similar (adaptive load balancing) on MX routers in Junos 14.1, and did Adaptive Flowlet Splicing within a Virtual Chassis Fabric (a nice rehash of the topic).

Need more information?

see 7 comments

SDN Router @ Spotify on Software Gone Wild

Imagine you need a data center WAN edge router with multiple 10GE uplinks. You’d probably go for an ASR or a MX-series router, right? How about using a 2 Tbps ToR switch and an SDN solution to make it work with full Internet routing table?

If you happen to have iTunes on your computer, please spend 10 seconds rating the podcast before you start listening to it. Thank you!

read more see 8 comments

Pick a Topic for NSX Deep Dive Software Gone Wild Episode

Dmitri Kalintsev, one of the networking guys from VMware NSX team, has kindly agreed to do an NSX technical deep dive Software Gone Wild episode… and you have the opportunity to tell him what you’d like to hear. It’s as easy as writing a comment, and we’ll pick one of the most popular topics.

Do keep in mind that we plan to do a technical deep dive, and it has to fit within an hour or so or nobody will ever listen to it, so please keep your suggestions focused. “Troubleshooting NSX”, “NSX Design”, or “NSX versus ACI ” is not what we’re looking for ;)

see 12 comments

Palo Alto Virtual Firewalls on Software Gone Wild

One of the interesting challenges in the Software-Defined Data Center world is the integration of network and security services with the compute infrastructure and network virtualization. Palo Alto claims to have tightly integrated their firewalls with VMware NSX and numerous cloud orchestration platforms - it was time to figure out how that’s done, so we decided to go on a field trip into the scary world of security.

read more see 3 comments

Latency: the Killer of Spread-Out Application Stack Ideas

A few months ago I described how bandwidth limitations shatter the dreams of spread-out application stacks with elements residing (or being dynamically migrated) between data centers. Today let’s focus on bandwidth’s ugly cousin: latency.

TL&DR Summary: Spreading the server components of an application across multiple locations (multiple data centers or hybrid cloud deployments) can easily result in dismal performance even when there’s plenty of bandwidth available.

read more add comment

How Does MPLS-TE Interact with QoS

MPLS Traffic Engineer is sometimes promoted as a QoS solution (it seems bandwidth calendaring is a permanent obsession of some networking engineers, and OpenFlow is no more a solution than MPLS-TE was ;), but in reality it’s pretty hard to make the two work together seamlessly (just ask anyone who had to implement auto-bandwidth MPLS-TE in a large network).

Not surprisingly, we addressed the topic during our MPLS Tech Talk.

see 1 comments

BGP Deaggregation with Conditional Route Injection

Whenever there’s a weird request to do something totally illogical with BGP, there’s a knob in Cisco IOS to get it done (and increase the heartburn of CCIE candidates). Conditional Route Injection (the ability to insert more specific prefixes into BGP without having them in the IP routing table) is one of them.

Keep in mind: being a MacGyver is not a long-term strategy. Just because you can doesn’t mean that you should.

read more see 19 comments
Sidebar