Building Network Automation Solutions
6 week online course starting in September 2017

How Much Data Center Bandwidth Do You Really Need?

Networking vendors are quick to point out how the opaqueness (read: we don’t have the HW to look into it) of overlay networks presents visibility problems and how their favorite shiny gizmo (whatever it is) gives you better results (they usually forget to mention the lock-in that it creates).

Now let’s step back and ask a fundamental question: how much bandwidth do we need?

Disclaimer: If you’re running a large public cloud or anything similarly sized, this is not the post you’re looking for.

Let’s assume:

  • We have mid-sized workload of 10.000 VMs (that’s probably more than most private clouds see, but let’s err on the high side);
  • The average long-term sustained network traffic generated by every VM is around 100 Mbps (I would love to see a single VM that’s not doing video streaming or network services doing that, but that’s another story).

The average bandwidth you need in your data center is thus 1 Tbps. Every pizza box ToR switch you can buy today has at least 1.28 Tbps of non-blocking bandwidth. Even discounting for marketing math, you don’t need more than two ToR switches to satisfy your bandwidth needs (remember: if you have only two ToR switches you have 1.28 Tbps of full-duplex non-blocking bandwidth).

If that’s not enough (or you think you should take in account traffic peaks), take a pair of Nexus 6000s or build a leaf-and-spine fabric.

In many cases VMs have to touch storage to deliver data to their clients, and that’s where the real bottleneck is. Assuming only 10% of the VM-generated data comes from the spinning rust (or SSDs) I’d love to see the storage delivering sustained average throughput of 100 Gbps.

How about another back-of-the-napkin calculation:

  • A data center has two 10Gbps WAN links;
  • 90% of the traffic stays within the data center (yet again on the high side – supposedly 70-80% is a more realistic number).

Based on these figures, the total bandwidth needed in the data center is 200 Gbps. Adjust the calculation for your specific case, but I don’t think many of you will get above 1-2 Tbps.

Obviously you might have bandwidth/QoS problems if:

  • You use legacy equipment full of oversubscribed GE linecards;
  • You still run a three-tier DC architecture with heavy oversubscription between tiers;
  • You built a leaf-and-spine fabric with 10:1 oversubscription (yeah, I’ve seen that);
  • You have no idea how much traffic your VMs generate and thus totally miscalculate the oversubscription factor;

... but that has nothing to do with overlay virtual networks – if anything of the above is true you have a problem regardless of what you run in your data center.

Just in case you need more information

Check out these webinars:

All webinars are available as part of the yearly subscription and you can always ask me for a second opinion or a design review.

10 comments:

  1. All the internets! A bit silly I'm sorry.
    http://memegenerator.net/instance/41271256



    not really

    ReplyDelete
  2. Sounds like a really good reason to take a serious look at FCoE, NAS, or iSCSI. The moral of the story is that there is a significant amount of unused bandwidth in a modern data center that could be used instead of buying a separate storage network.

    That said, I'm not sure the analysis really accounts for microbursts. Consider two users accessing a 10 Gbps file share at 1 Gbps client speeds. While the users may quickly transfer their file and the average utilization of the file server is 100 Mbps over a 1 minute interval, the user's experience would be wildly different if the server were simply connected at 100 Mbps (or even 1 Gbps) as opposed to 10 Gbps as the users would contend for bandwidth and each transfer would take longer.

    ReplyDelete
    Replies
    1. "I'm not sure the analysis really accounts for microbursts" - it doesn't. I just wanted to demonstrate that bandwidth is not a problem some people would like it to be.

      Delete
  3. I find this article irritating! This is a numbers game that disconnected CEO/CFO/CIO dummies use to try and sound smart. The real fact is that no formula can ever tell you how much bandwidth you really need! It's entirely based on every individual company's situation, and you need to factor in end-user access speeds, replication, DR connections, VPN connections, Application bandwidth uses, Etc... !!!

    Come On Man! Write a real article and do some research before you jump on a number game bandwagon to "try" and sound smart. Those of us who do this for a living can see right through it!

    -Ciscoman

    ReplyDelete
    Replies
    1. See my previous comment ;) I totally agree with your sentiment, but sometimes rule-of-thumb (aka Fermi Problem) helps.

      Delete
  4. The answer is always more. Not so much from a fat pipe scenario, but rather a latency and parallel processing perspective.

    I have implemented both IB and 10GbE. Both have plenty of bandwidth to move massive amounts of data. But, the IB solution moves those continual little bits much more quickly. Latency is now king.

    It is similar to death by a thousand cuts. Numerous verbose protocols and applications are leaching the processing power out of infrastructure. The network must be extremely low latency to support this momentary communications.

    I agree, the majority of use cases are more than covered by 10GbE and 4GbFC when properly engineered. And as prices drop, the standards will continue to rise. Remember the transition to 1GbE? Many systems due to surpass 100MbE today, but the transport and transactional times demand 1Gb+.

    So, how much do I need? As much as I can afford...

    ReplyDelete
    Replies
    1. Now, I don't think I ever said "use 1GE". On the contrary, I'm a big believer in 10GE ... I see another blog post coming :D

      Delete
    2. Sorry to be unclear. I was not trying to suggest you have recommended 1GE, rather the bandwidth numbers rarely rise above the bandwidth of 1GE.

      I believe the greatest benefit from the higher bandwidth technologies is the lower latency.

      Delete
  5. Great post as always Ivan, I agree and think much of the focus in enterprise needs to be on increasing efficient density across the stack. For me personally the fact that fex's can and do work in many enterprises, I find that to be a bit troubling and seems a clear indication there is much room for improvement in terms of density and ensuring that enterprises can use the full capacity of the investments they are making. If big web 2.0's can actually make use of high performance ToR capacity and enterprises can only make use of much less, I think that is a problem, hopefully with time virtualization will have an impact on this as it has had for servers. And per the other comments it is getting to be much more critical, imho, for enterprises to build monitoring networks and as openflow can make this more economically feasible, hopefully we will see much more deployment of monitoring networks so we can accurately tailor our designs.

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.