A. Friend sent me a long list of questions after listening to excellent Future of Networking podcast with Martin Casado because (as he said) he prefers “having a technical discussion with arguments and not just throwing statements out there.”
Which I totally agree with. Complexity belongs to the network edge (ideally into the end-hosts) and every time we managed to move it from the network to the end-hosts, we got cheaper and faster networks (example: moving from X.28 to Telnet, and from FC to NFS/iSCSI).
They make statements that multicast and QoS are going away.
Also somewhat agree with. Not unconditionally, but yes (and their discussion was more nuanced than that anyway, so go and listen to it). Let’s go into details.
While I can agree with some statements, multicast is there for a reason.
There are a few well-known use cases for multicast (live 1-to-N video streaming, stock exchange feeds), and a few others where the developers wanted to push the problem of service discovery to someone else (yet again proving RFC 1925 section 2.6). Apart from that, multicast has been a zombie for decades.
They mention the application keeping track of who requested the stream instead. OK, so we moved the state to the application. That's fine and dandy when app is hosted on one device. Now our service is popular and needs to be hosted on multiple devices, who keeps state? How do we sync the state? Hasn't the amount of state increased compared to having it in the network? Not even taking concern of the bandwidth are wasting.
First, state is cheap when it’s implemented in low-speed software. Low-speed RAM is cheap, high-speed packet forwarding engines using TCAM (because a walk down a data structure would be too slow) is expensive. Moving state to low-speed components makes perfect sense.
Also, how many times have you seen endpoints requiring the exact same content, which could then be effectively replicated on-the-fly within the network (apart from the two use cases I mentioned above)?
On a somewhat tangential topic, creating unnecessary state and/or syncing state across multiple devices is best avoided if you want to scale your solution. You can start with something as simple as not keeping session state in local files on your web server (so you don’t have to use session stickiness on your load balancer), or go as far as Facebook did.
Regarding QoS, it's just a tool to differentiate traffic based on business needs. I can agree that throwing more bandwidth at things is often the best solution. However, that is not always an option and as long as we consider a certain type of traffic to be more important than other traffic we must have some form of QoS or queuing.
Generic widespread end-to-end QoS has been dead even before it was born ;) One would hope that we’ve learned that lesson from ATM. Read what Geoff Huston wrote on the topic of Internet-wide QoS, and listen to the Packet Pushers podcast with Douglas Comer.
In the data centers, it’s easier to throw more bandwidth at the problem, and solve the potential remaining 5% in the application stack. MP-TCP is one of the solutions addressing the hashing problems of elephant flows, as is MPIO. Other tricks like FlowBender modify hashing fields (IPv6 flow label or TTL) until the TCP session hits a non-congested link (they already have a patch for Linux kernel).
We’ll talk about these issues in the next Leaf-and-Spine Fabric Designs webinar sometimes in autumn.
As for QoS on WAN, the real question is “can you control the congestion point?”, and in most cases, the answer is NO (because the congestion happens in DSLAM or within the SP access/core network), which leaves the end-to-end congestion tracking offered by some SD-WAN vendors as the only alternative. Yes, it’s QoS (shaping + subsequent queuing) but it’s done at the very edge of the congestion domain, not everywhere in the network.