Building network automation solutions

9 module online course

Start now!

Category: load balancing

Case Study: Combine Physical and Virtual Appliances in a Private Cloud

Cloud builders are often using my ExpertExpress service to validate their designs. Tenant onboarding into a multi-tenant (private or public) cloud infrastructure is a common problem, and tenants frequently want to retain the existing network services appliances (firewalls and load balancers).

The Combine Physical and Virtual Appliances in a Private Cloud case study describes a typical solution that combines per-tenant virtual appliances with frontend physical appliances.

add comment

Improving ECMP Load Balancing with Flowlets

Every time I write about unequal traffic distribution across a link aggregation group (LAG, aka Etherchannel or Port Channel) or ECMP fabric, someone asks a simple question “is there no way to reshuffle the traffic to make it more balanced?

TL&DR summary: there are ways to do it, and some vendors already implemented them.

The Problem

The algorithm that spreads the traffic across a group of outbound links (LAG or set of ECMP next hops) has to satisfy a few requirements:

  • It has to work reasonably well in typical environments;
  • It should not reorder packets of the same flow (here’s why);
  • It has to be simple enough to be implementable in reasonably cheap ASICs;

The second and third requirement result in what the chipset manufacturers (and subsequently the hardware vendors) are offering today: hash-based distribution of packets. In case you need a step-by-step overview of this process, here’s how it works:

  • Create an array of buckets and assign each outgoing link to one or more buckets. The bucket size is the number you see in marketing papers as “we support N-way ECMP” or “we have N-way LAG”.
  • Take N fields from the outgoing packet header. The fields could be MAC addresses (source and/or destination), IP addresses (source and/or destination), IP port numbers, or even some other fixed-position fields in the packet header. Some vendors – for example Arista – allow you to configure which fields you want to use (assuming the platform chipset supports this functionality).
  • Hash the fields from the packet header to get an integer between 0 and bucket size – 1. Example: for bucket sizes that are power of two take the low-order N bits of the hash.
  • Enqueue the packet into the output queue of the interface that is associated with the bucket selected by the packet hash.

Have you noticed that the algorithm never checks the size of the output queue? If the hashing algorithm decides to send the packet through Interface#1, the switch will send the packet through Interface#1 even though that interface might be dropping packets like crazy due to continuous congestion, and all the other interfaces sit idle.

The reason the load-balancing algorithm never checks the load on the outbound interface is simple: the typical environment mentioned above is usually assumed to be a healthy mix of numerous independent mice flows. Throw a few elephants in the mix and the assumptions start breaking down.

The only vendor that was always able to cope with the elephants in the mix is Brocade due to the fact that their traditional typical environment (storage networks) consists mainly of elephants.

Can We Solve the Problem?

Here’s an intriguingly simple question: Why can’t we change the mix of outgoing interfaces in the N-way ECMP table to reflect the actual interface load? Wouldn’t that allow us to push the mice flows away from elephants crowding some of the interfaces?

In principle, the answer is “Sure, we could do that”, but we have to solve three challenges:

  • Coarse-grained reshuffling could make matters worse. If your hardware supports 8-way ECMP and you have four uplinks, you might shift a large proportion of the traffic when you reassign the buckets to less-loaded interfaces, resulting in a nasty oscillation. Modern chipsets support at least 256-way ECMP, so that shouldn’t be a problem.
  • The hardware you use has to support per-bucket counters. All hardware supports per-interface counters, but while they help you identify the congested interfaces, the won’t help you reshuffle the traffic – if the control-plane software cannot see how much traffic goes through each bucket, it makes no sense to randomly reshuffle the buckets hoping for the best.
  • We shall not reorder the packets (at least within the data center), which means that we cannot reshuffle active buckets, but it’s relatively safe to change the outgoing interface of a currently inactive bucket. You could still reorder packets within a TCP session under an unlikely set of circumstances (figuring out what those circumstances are is left as an exercise for the reader), but we just might have to accept that slight risk of temporary performance degradation if we want to get better link utilization.

Would the reshuffle inactive buckets idea work in practice? Are there inactive buckets in a typical high-volume data center environment? Welcome to the weird world of flowlets.

What Are Flowlets?

It seems the idea of flowlets first appeared in the Harnessing TCP’s Burstiness with Flowlet Switching paper (see also corresponding PPT) – due to the bursty nature of TCP, you might be able to do pretty reliable bucket reshuffling with 256 or more buckets, as some buckets always tend to be empty.

Microsoft started using flowlets in Windows Server 2012 R2, and recently Cisco implemented flowlet-based dynamic load balancing in the ACI leaf-and-spine fabrics. Juniper is doing something similar (adaptive load balancing) on MX routers in Junos 14.1, and did Adaptive Flowlet Splicing within a Virtual Chassis Fabric (a nice rehash of the topic).

Need more information?

see 7 comments

So You’re an Open Source Shop? Really?

I carried out an interesting quiz during one of my Interop workshop:

  • How many use Linux-based servers? Almost everyone raised their hands;
  • How many use Apache or Tomcat web servers? Yet again, almost everyone.
  • How many run applications written in PHP, Python, Ruby…? Same crowd (probably even a bit more).
  • How many use Nginx, Squid or HAProxy for load balancing? Very few.

Is there a rational explanation for this seemingly nonsensical result?

read more see 9 comments

It’s OK to Let Developers Go @ Amazon Web Services, but Not at Home? You Must Be Kidding!

Recently I was discussing the benefits and drawbacks of virtual appliances, software-defined data centers, and self-service approach to application deployment with a group of extremely smart networking engineers.

After the usual set of objections, someone said “but if we won’t become more flexible, the developers will simply go to Amazon. In fact, they already use Amazon Web Services.

read more see 5 comments

Load Balancing Across IP Subnets

One of my readers sent me this question:

I have a data center with huge L2 domains. I would like to move routing down to the top of the rack, however I’m stuck with a load-balancing question: how do load-balancers work if you have routed network and pool members that are multiple hops away? How is that possible to use with Direct Return?

There are multiple ways to make load balancers work across multiple subnets:

read more see 6 comments

Scale-Out Load Balancing with OpenFlow

When OpenFlow was still fresh and exciting, someone made quite a name for himself by proposing a global load-balancing solution that would install per-session OpenFlow entries in every core switch around the world. Clearly a great idea, mimicking the best experiences we had with ATM SVCs.

Meanwhile some people started using OpenFlow in real-life networks for coarse-grained load balancing that improves the scalability of stateful network services. For more details, watch the video recorded during the Real Life OpenFlow-based SDN Use Cases webinar.

see 2 comments

iOS uses Multipath TCP – Does It Matter?

When Apple launched the new release of iOS last autumn, networking gurus realized the new iOS uses MP-TCP, a recent development that allows a single TCP socket (as presented to the higher layers of the application stack) to use multiple parallel TCP sessions. Does that mean we’re getting closer to fixing the TCP/IP stack?

TL&DR summary: Unfortunately not.

read more see 6 comments

OMG, Who Will Manage All Those Virtual Firewalls?

Every time I talk about small (per-application) virtual appliances, someone inevitably criesAnd who will manage thousands of appliances?” Guess what – I’ve heard similar cries from the mainframe engineers when we started introducing Windows and Unix servers. In the meantime, some sysadmins manage more than 10.000 servers, and we’re still discussing the “benefits” of humongous monolithic firewalls.

read more see 13 comments

Are Your Applications Cloud-Friendly?

A while ago I had a discussion with someone who wanted to be able to move whole application stacks between different private cloud solutions (VMware, Hyper-V, OpenStack, Cloud Stack) and a variety of public clouds.

Not surprisingly, there are plenty of startups working on the problem – if you’re interested in what they’re doing, I’d strongly recommend you add CloudCast.net to your list of favorite podcasts – but the only correct way to solve the problem is to design the applications in a cloud-friendly way.

read more see 4 comments
Sidebar