How Much Data Center Bandwidth Do You Really Need?
Networking vendors are quick to point out how the opaqueness (read: we don’t have the HW to look into it) of overlay networks presents visibility problems and how their favorite shiny gizmo (whatever it is) gives you better results (they usually forget to mention the lock-in that it creates).
Now let’s step back and ask a fundamental question: how much bandwidth do we need?
Disclaimer: If you’re running a large public cloud or anything similarly sized, this is not the post you’re looking for.
- We have mid-sized workload of 10.000 VMs (that’s probably more than most private clouds see, but let’s err on the high side);
- The average long-term sustained network traffic generated by every VM is around 100 Mbps (I would love to see a single VM that’s not doing video streaming or network services doing that, but that’s another story).
The average bandwidth you need in your data center is thus 1 Tbps. Every pizza box ToR switch you can buy today has at least 1.28 Tbps of non-blocking bandwidth. Even discounting for marketing math, you don’t need more than two ToR switches to satisfy your bandwidth needs (remember: if you have only two ToR switches you have 1.28 Tbps of full-duplex non-blocking bandwidth).
If that’s not enough (or you think you should take in account traffic peaks), take a pair of Nexus 6000s or build a leaf-and-spine fabric.
In many cases VMs have to touch storage to deliver data to their clients, and that’s where the real bottleneck is. Assuming only 10% of the VM-generated data comes from the spinning rust (or SSDs) I’d love to see the storage delivering sustained average throughput of 100 Gbps.
How about another back-of-the-napkin calculation:
- A data center has two 10Gbps WAN links;
- 90% of the traffic stays within the data center (yet again on the high side – supposedly 70-80% is a more realistic number).
Based on these figures, the total bandwidth needed in the data center is 200 Gbps. Adjust the calculation for your specific case, but I don’t think many of you will get above 1-2 Tbps.
Obviously you might have bandwidth/QoS problems if:
- You use legacy equipment full of oversubscribed GE linecards;
- You still run a three-tier DC architecture with heavy oversubscription between tiers;
- You built a leaf-and-spine fabric with 10:1 oversubscription (yeah, I’ve seen that);
- You have no idea how much traffic your VMs generate and thus totally miscalculate the oversubscription factor;
... but that has nothing to do with overlay virtual networks – if anything of the above is true you have a problem regardless of what you run in your data center.
Just in case you need more information
Check out these webinars:
- Data Center 3.0 if you’re new to data center networking;
- Clos Fabrics Explained if you’re building a new data center networking fabric;
- Data Center Fabrics if you can’t decide which vendor to choose.
All webinars are available as part of the yearly subscription and you can always ask me for a second opinion or a design review.
That said, I'm not sure the analysis really accounts for microbursts. Consider two users accessing a 10 Gbps file share at 1 Gbps client speeds. While the users may quickly transfer their file and the average utilization of the file server is 100 Mbps over a 1 minute interval, the user's experience would be wildly different if the server were simply connected at 100 Mbps (or even 1 Gbps) as opposed to 10 Gbps as the users would contend for bandwidth and each transfer would take longer.
Come On Man! Write a real article and do some research before you jump on a number game bandwagon to "try" and sound smart. Those of us who do this for a living can see right through it!
I have implemented both IB and 10GbE. Both have plenty of bandwidth to move massive amounts of data. But, the IB solution moves those continual little bits much more quickly. Latency is now king.
It is similar to death by a thousand cuts. Numerous verbose protocols and applications are leaching the processing power out of infrastructure. The network must be extremely low latency to support this momentary communications.
I agree, the majority of use cases are more than covered by 10GbE and 4GbFC when properly engineered. And as prices drop, the standards will continue to rise. Remember the transition to 1GbE? Many systems due to surpass 100MbE today, but the transport and transactional times demand 1Gb+.
So, how much do I need? As much as I can afford...
I believe the greatest benefit from the higher bandwidth technologies is the lower latency.
Can I build a private 5G network and use it to connect multiple data centers?
Given enough money, of course. Does it make sense? Maybe not.