Can We Just Throw More Bandwidth at a Problem?

Tuesday, June 10, 2014 07:43 +0200

Can We Just Throw More Bandwidth at a Problem?

One of my readers sent me an interesting question:

I have been reading at many places about "throwing more bandwidth at the problem." How far is this statement valid? Should the applications(servers) work with the assumption that there is infinite bandwidth provided at the fabric level?

Moore’s law works in our favor. It’s already cheaper (in some environments) to add bandwidth than to deploy QoS.

Data center networks are a prime example. With prices of 10GE ports in $200 - $300 range it makes no sense to deploy QoS and incur perpetual technical debt. A single data center fabric should provide equidistant endpoints – you should be able to get the same amount of bandwidth between any two endpoints (whether you get linerate bandwidth or a fraction of it due to leaf-to-spine oversubscription is a different discussion that depends on the needs of your workload).

If you want to have some redundancy in your data center, you might want to build more than one availability zone (just make sure you don’t bridge across multiple zones, turning them in a single failure domain). Should you provide the same bandwidth between endpoints in the same availability zone and across availability zones? It depends on the workload needs and your budget, but it’s still a feasible idea.

WAN bandwidth is orders of magnitude more expensive than LAN bandwidth, so it’s impossible to just throw more bandwidth at a problem. QoS might temporarily alleviate some bottlenecks, but it’s a zero-sum game, and large-scale QoS never worked well.

Flow-based forwarding and other ideas promoted by OpenFlow evangelists are just QoS in disguise.

If you want to solve the WAN bandwidth challenges, you have to start with the application architecture – once the application architects figure out the laws of physics (including the upper bound on information propagation speed) and fallacies of distributed computing, and start architecting the applications with real-life bandwidth and latency in mind, everyone’s life becomes much simpler.

Related webinars

Building a data center fabric? Start with Clos Fabrics Explained webinar.
Selecting the data center switches? Check out the Data Center Fabrics and its update sessions.
Building a cloud infrastructure? You’ll find numerous guidelines in the Designing Private Cloud Infrastructure webinar.
Get all three webinars as part of the Cloud Computing and Networking track or with the webinar subscription.

10 comments:

JC 10 June 2014 15:07

Speaking of throwing bandwidth at a problem, I've just found out (although this is old news) that Cisco's Nexus FEX can locally only do CoS based QoS, so servers that are connected to a access port (ie no 802.1p/q) on a FEX cannot have their packets classified and queued appropriately queued (default best effort) on the FEX uplink to the parent Nexus.

To which Cisco's TAC recommendation was basically "make sure you throw in enough uplinks to avoid congestion"...

What's your take on that? Is that a reasonable case of throw in bandwidth at the problem or is this just one more hidden crippling caveat for FEXs?

Ivan Pepelnjak 11 June 2014 20:27

How about this: http://bit.ly/1pkFf0n

Will write a blog post, just give me a few days.

Michael 12 June 2014 11:02

I don't know why anyone would want a FEX when they can get a full-fledged layer 3 switch at a similar price point.
Considering the TCO of a FEX versus a real switch, it should be even more of a no-brainer.

jsicuran 12 June 2014 23:08

The FEX can use the untagged-cos feature for basic WRR CoS.

Calin C. 01 July 2014 14:46

@Michael - 1 parent + 3 FEX = 1 mgmt point, 1 parent + 3 switches = 4 mgmt points, just one thing that come in my mind reading your comment.
It's just simplier.

Anonymous 10 June 2014 15:51

CoS-based QoS on a primarily layer 2 switch is perfectly fine, and putting enough uplinks to handle congestion is just common sense.

Orhan Ergun 10 June 2014 19:00

For the high speed link environment such as LAN or Data Center , this question I also receive frequently. IMHO We may need QoS even though operational cost of it ,configuration complexity and increased MTTR as negative factor , still if you want to manage microbursts, you may need to implement QoS at some level. Long topic actually but if you have a load which can create microburst so design might need to be re thought.

JC 10 June 2014 19:34

And if you've got VoIP traversing your infrastructure, microbursts are your silent killer...

Anonymous 24 June 2014 15:27

Having enough uplinks is a key for DCs leafs, but ECN can make miracles with a fine tuned TCP stack.

Calin C. 01 July 2014 14:53

If there would be something in the world like _ask_ , _involve_ , _discuss_ things before you start putting effort into application development just to find out that the "lab testing" does not perform the same over distributed location connected with 100Mbps and 200msec RTT :)

Another problem I see here is the initial requirement. How many times you hear, I want some app just to do this simple stuff and then the client take that "simple app" and want to scale it to perform worldwide.

Related webinars

Recent posts in the same categories

data center

fabric

QoS

10 comments: