Whenever I get asked about QoS in the data center, my stock reply is “bandwidth is cheaper than QoS-induced complexity.” This is definitely true in most cases, and ideally the elephant problems should be solved higher up in the application stack, not with network-layer kludges, but are there situations where you actually need data center QoS?
A week or so ago I described why a properly implemented hypervisor-based overlay virtual networking data plane is not a scalability challenge; even though the performance might decrease slightly as the total number of forwarding entries grow, modern implementations easily saturate 10GE server uplinks.
Scalability of the central controller or orchestration system is a totally different can of worms. As I explained in the Scaling Overlay Networks, the only approach that avoids single failure domain and guarantees scalability is scale-out control plane architecture.
John Herbert wrote a wonderful post explaining why he (and a lot of other people including myself) hates seeing Gartner quotes in vendor presentations. Let me elaborate a bit on this apparent anti-Gartner sentiment.
Whenever I write about the crazy things vendors are trying to sell us, and the kludges we have to live with, I keep wondering, “Is it just me, or is the whole industry really as ridiculous as it seems?” It’s so nice to see someone else coming to the same conclusions, like Mark Burgess (the author of CFEngine and the Promise Theory) did in a lengthy essay on whether SDN makes sense.
BGP is without doubt the most scalable routing protocol, which made it a popular choice for large-scale deployments from service provider networks to enterprise WAN/VPN networks and even data centers. Its only significant drawback is the tedious configuration process (which almost reminds me of writing COBOL programs decades ago).
After almost exactly three years of struggles our BGP Operations and Security draft became RFC 7454 – a cluebat (as Gert Doering put it) you can use on your customers and peers to help them fix their BGP setup.
Without Jerome Durand this document would probably remain forever stuck in the draft phase. It’s amazing how many hurdles one has to jump over to get something published within IETF. Thanks a million Jerome, you did a fantastic job!
Years ago I managed to saturate a 10GE uplink on a vSphere server I tested with a single Linux VM using less than one vCPU. On the other hand, squeezing 1 Gbps out of Open vSwitch using GRE encapsulation was called ludicrous speed not so long ago. Implementing overlay virtual networking in the hypervisor obviously carries a huge performance penalty, right? Not so fast…
In the Myths That Refuse to Die: Scalability of Overlay Virtual Networking blog post I wrote “number of MAC addresses has absolutely no impact on the forwarding performance until the MAC hash table overflows”, which happens to be almost true.
Want to know even more about Tail-F NCS after listening to Episode 22 of Software Gone Wild? Boštjan Šuštar and Marko Tišler from NIL Data Communications continue their deep dive into the secrets of NCS in Software Gone Wild Episode 23.
If you watched the Network Field Day videos, you might have noticed an interesting (somewhat one-sided) argument I had with Sunay Tripathi, CTO and co-founder of Pluribus Networks (start watching at around 32:00 to get the context). Let’s try to get the record straight.
Brad Hedlund sent me a link to a fantastic list of NSX resources, from design and troubleshooting guides to videos and blog posts. A must-bookmark page if you're even remotely interested in VMware NSX.
I’m sitting in the San Francisco airport with nothing better to do than writing blog posts, so let’s see what we’ve seen and learned during the Networking Field Day 9.
Most videos recorded during the week are already online. You’ll find links to them in the Presentation Calendar section.
Tail-F NCS implements one of the most realistic approaches to service abstraction (the cornerstone of SDN – at least in my humble opinion) – an orchestration system that automates service provisioning on existing infrastructure.
Is the product really as good as everyone claims? How hard is it to use? How steep is the learning curve? Boštjan Šuštar and Marko Tišler from NIL Data Communications have months of hands-on experience and were willing to share it in Episode 22 of Software Gone Wild.
One of my readers got an interesting idea: he’s trying to make the most of his WAN links by doing per-packet load balancing between a 30 Mbps and a 50 Mbps link. Not exactly surprisingly, the results are not what he expected.
Sometimes the stars do align: Open Networking Summit organized their Service Provider Accelerate Workshop just a day prior to Network Field Day, so I had the fantastic opportunity to attend both.
I didn’t know what to expect from an event full of SDN/NFV thought leaders, and was extremely pleasantly surprised by the amount of realistic down-to-earth information I got.
Industry press, networking blogs, vendor marketing whitepapers and analyst reports are full of grandiose claims of benefits of whitebox switching and hardware disaggregation. Do you ever wonder whether these people actually practice their theories?
“Thou Shalt Have No Chokepoints” is one of those simple scalability rules that are pretty hard to implement in real-life products. In the Distributed Data Plane part of Scaling Overlay Networks webinar I listed data plane components that can be easily distributed (layer-2 and layer-3 switching), some that are harder to implement but still doable (firewalling) and a few that are close to mission-impossible (NAT and load balancing).
I’ll be speaking at two conferences in March: SDN event in Zurich organized by fantastic Gabi Gerber, and the best boutique security conference – Troopers 15 in Heidelberg. If you’ll be attending one of these events, just grab me, drag me to the nearest coffee table, and throw some interesting questions my way ;) … and if you happen to be near one of these locations, let me know and we might figure out how to meet somewhere.
Late last year David Gee and I wanted to test another interesting gizmo: an online virtual whiteboard. David was pondering some interesting aspect of Cisco ACI and they seemed like a perfect topic for an impromptu discussion.
During a great conversation I had with Terry Slattery during Interop New York, he said “well, I don’t think anyone should be configuring VLANs and asking ÔÇśHow to configure a VLAN on a switch’ – we should be focused on providing end-to-end connectivity”, and there’s absolutely nothing in that statement that one could disagree with.
Short answer: don’t even think about doing that. The added complexity is not worth whatever extra money you’ll be charging the customer (or not).
I expect to hear a lot about the “wonderful” idea of moving running VMs 100 msec away (across the continent) in the upcoming weeks. I would recommend you read a few of my older blog posts before considering it… and don’t waste time trying to persuade the true believers with technical arguments – talk with whoever will foot the bill or walk away.
I’m still convinced that architectures with centralized control planes (and that includes solutions relying on OpenFlow controllers) cannot scale. On the other hand, Big Switch Networks is shipping Big Cloud Fabric, and they claim they solved the problem. Obviously I wanted to figure out what’s going on and Andy Shaw and Rob Sherwood were kind enough to explain the interesting details of their solution.
Long story short: Big Switch Networks significantly extended OpenFlow.
A few days ago I completed the last chapter in the Data Center Design Case Studies book: building disaster recovery and active-active data centers. It focuses on application behavior and business needs, not on the underlying technologies; the networking technology part tends to be way easier to solve than the oft-ignored application-level challenges.