Can We Just Throw More Bandwidth at a Problem?
One of my readers sent me an interesting question:
I have been reading at many places about "throwing more bandwidth at the problem." How far is this statement valid? Should the applications(servers) work with the assumption that there is infinite bandwidth provided at the fabric level?
Moore’s law works in our favor. It’s already cheaper (in some environments) to add bandwidth than to deploy QoS.
Data center networks are a prime example. With prices of 10GE ports in $200 - $300 range it makes no sense to deploy QoS and incur perpetual technical debt. A single data center fabric should provide equidistant endpoints – you should be able to get the same amount of bandwidth between any two endpoints (whether you get linerate bandwidth or a fraction of it due to leaf-to-spine oversubscription is a different discussion that depends on the needs of your workload).
If you want to have some redundancy in your data center, you might want to build more than one availability zone (just make sure you don’t bridge across multiple zones, turning them in a single failure domain). Should you provide the same bandwidth between endpoints in the same availability zone and across availability zones? It depends on the workload needs and your budget, but it’s still a feasible idea.
WAN bandwidth is orders of magnitude more expensive than LAN bandwidth, so it’s impossible to just throw more bandwidth at a problem. QoS might temporarily alleviate some bottlenecks, but it’s a zero-sum game, and large-scale QoS never worked well.
Flow-based forwarding and other ideas promoted by OpenFlow evangelists are just QoS in disguise.
If you want to solve the WAN bandwidth challenges, you have to start with the application architecture – once the application architects figure out the laws of physics (including the upper bound on information propagation speed) and fallacies of distributed computing, and start architecting the applications with real-life bandwidth and latency in mind, everyone’s life becomes much simpler.
Related webinars
- Building a data center fabric? Start with Clos Fabrics Explained webinar.
- Selecting the data center switches? Check out the Data Center Fabrics and its update sessions.
- Building a cloud infrastructure? You’ll find numerous guidelines in the Designing Private Cloud Infrastructure webinar.
- Get all three webinars as part of the Cloud Computing and Networking track or with the webinar subscription.
To which Cisco's TAC recommendation was basically "make sure you throw in enough uplinks to avoid congestion"...
What's your take on that? Is that a reasonable case of throw in bandwidth at the problem or is this just one more hidden crippling caveat for FEXs?
Will write a blog post, just give me a few days.
Considering the TCO of a FEX versus a real switch, it should be even more of a no-brainer.
It's just simplier.
Another problem I see here is the initial requirement. How many times you hear, I want some app just to do this simple stuff and then the client take that "simple app" and want to scale it to perform worldwide.