When someone tells you that “TCP is a lossy protocol” during a job interview, don’t throw him out immediately – he was just trusting the Internet a bit too much (click to enlarge).
Everyone has a bad hair day, and it really doesn’t matter who published that text… but if you’re publishing technical information, at least try to do no harm.
A. Friend sent me a long list of questions after listening to excellent Future of Networking podcast with Martin Casado because (as he said) he prefers “having a technical discussion with arguments and not just throwing statements out there.”
He started with “Martin's view seems to be that network is all plumbing and all the intelligence should be in the applications.”
When I asked “Are there any truly QoS-aware routing protocols out there?” in one of my SD-WAN posts, Marcelo Spohn from ADARA Networks quickly pointed out that they have one – Dynamic Link-State Routing Protocol.
He also claimed that DLSP has no scalability concerns – more than enough reasons to schedule an online chat, resulting in Episode 40 of Software Gone Wild. We didn’t go too deep this time, but you should get a nice overview of what DLSP is and how it works.
Ethan Banks recently wrote a nice blog post detailing the benefits and drawbacks of traditional routing protocols and comparing them with their SD-WAN counterparts.
While I agree with everything he wrote, the comparison between the two isn’t exactly fair – it’s a bit like trying to cut the cheese with a chainsaw and complaining about the resulting waste.
Whenever I get asked about QoS in the data center, my stock reply is “bandwidth is cheaper than QoS-induced complexity.” This is definitely true in most cases, and ideally the elephant problems should be solved higher up in the application stack, not with network-layer kludges, but are there situations where you actually need data center QoS?
Short answer: don’t even think about doing that. The added complexity is not worth whatever extra money you’ll be charging the customer (or not).
If you listen to the marketing departments of overlay virtual networking vendors, it looks like the world is a simple place: you deploy their solution on top of any IP fabric, and it all works.
You’ll hear a totally different story from the physical hardware vendors: they’ll happily serve you a healthy portion of FUD, hoping you swallow it whole, and describe in gory details all the mishaps you might encounter on your virtualization quest.
The funny thing is they’re all right (not to mention the really fun part when FUDders change sides ;).
My “Was it bufferbloat?” blog post generated an unexpected amount of responses, most of them focusing on a side note saying “it looks like there really are service providers out there that are clueless enough to reorder packets within a TCP session”. Let’s walk through them.
One of my readers opened another can of VMware vSwitch worms. He sent me this question:
If a VM were to set a COS value, would the vSwitch reset it to 0 as part of its process of building the dot1q header?
The nasty detail (as you probably know) is that 802.1p CoS value resides in the 802.1q (VLAN) tag.
The Mice and Elephants is a traditional QoS fable – latency-sensitive real time traffic (or request-response protocol like HTTP) stuck in the same queue behind megabytes of file transfer (or backup or iSCSI) traffic.
The solution is also well known – color the elephants pink (aka DSCP marking) and sort them into a different queue – until the reality intervenes.
One of my readers sent me an interesting question:
I have been reading at many places about "throwing more bandwidth at the problem." How far is this statement valid? Should the applications(servers) work with the assumption that there is infinite bandwidth provided at the fabric level?
Moore’s law works in our favor. It’s already cheaper (in some environments) to add bandwidth than to deploy QoS.
One of my readers wanted to deploy FCoE on UCS in combination with Nexus 1000v and wondered how the FCoE traffic impacts QoS on Nexus 1000v. He wrote:
Let's say I want 4Gb for FCoE. Should I add bandwidth shares up to 60% in the nexus 1000v CBWFQ config so that 40% are in the default-class as 1kv is not aware of FCoE traffic? Or add up to 100% with the assumption that the 1kv knows there is only 6Gb left for network? Also, will the Nexus 1000v be able to detect contention on the uplink even if it doesn't see the FCoE traffic?
As always, things aren’t as simple as they look.
A long while ago there was an interesting discussion started by Brad Hedlund (then at Dell Force10) comparing leaf-and-spine (Clos) fabrics built from fixed-configuration pizza box switches with high-end chassis switches. The comments made by other readers were all over the place (addressing pricing, wiring, power consumption) but surprisingly nobody addressed the queuing issues.
This blog post focuses on queuing mechanisms available within a switch; the next one will address end-to-end queuing issues in leaf-and-spine fabrics.