VXLAN runs over UDP – does it matter?
Scott Lowe asked a very good question in his Technology Short Take #20:
VXLAN uses UDP for its encapsulation. What about dropped packets, lack of sequencing, etc., that is possible with UDP? What impact is that going to have on the “inner protocol” that’s wrapped inside the VXLAN UDP packets? Or is this not an issue in modern networks any longer?
Short answer: No problem.
Somewhat longer one: VXLAN emulates an Ethernet broadcast domain, which is not reliable anyway. Any layer-2 device (usually known as a switch although a bridge would be more correct) can drop frames due to buffer overflows or other forwarding problems, or the frames could become corrupted in transit (although the drops in switches are way more common in modern networks).
UDP packet reordering is usually not a problem – packet/frame reordering is a well-known challenge and all forwarding devices take care not to reorder packets within a layer-4 (TCP or UDP) session. The only way to introduce packet reordering is to configure per-packet load balancing somewhere in the path (hint: don’t do that).
Brocade uses very clever tricks to retain proper order of packets while doing per-packet load balancing across intra-fabric links.
Using UDP to transport Ethernet frames thus doesn’t break the expected behavior. Things might get hairy if you’d extend VXLAN across truly unreliable links with high error rate, but even then VXLAN-over-UDP wouldn’t perform any worse than other L2 extensions (for example, VPLS or OTV) or any other tunneling techniques. None of them uses a reliable transport mechanism.
Getting academic: Running TCP over TCP (which would happen in the end if one would want to run VXLAN over TCP) is a really bad idea. This paper describes some of the nitty-gritty details, or you could just google for TCP-over-TCP.
Some history: The last protocol stacks that had reliable layer-2 transport were SNA and X.25. SDLC or LAPB (for WAN links) and LLC2 (for LAN connections) were truly reliable – LLC2 peers acknowledged every L2 packet ... but even LLC2 was running across Token Ring or Ethernet bridges that were never truly reliable. We used reliable SNA-over-TCP/IP WAN transport (RSRB and later DLSW+) simply because the higher error rates experienced on WAN links (transmission errors and packet drops) caused LLC2 performance problems if we used plain source-route bridging.
And finally storage digression: Some people think Fiber Channel (FC) offers reliable transport. It doesn’t ... it just tries to minimize the packet loss by over-provisioning every device in the path because its primary application (SCSI) lacks fast retransmission/recovery mechanisms. We use FCIP (FC-over-TCP) on WAN links to reduce the packet drop rate, not to retain the end-to-end reliable transport.
Does it all matter?
Still not sure whether you should care about VXLAN? These blog posts might help you:
- Decouple virtual networking from the physical world
- Which virtual networking technology should I use?
- VXLAN, IP Multicast, OpenFlow and control planes
- Imagine the ruckus when the hypervisor vendors wake up
- Complexity belongs to the network edge
You’ll find more details in my webinars: Introduction to Virtual Networking and Cloud Computing Networking Under the Hood. You can buy their recordings individually or get them as part of the yearly subscription.
Also, FC does not minimize packet loss by overprovisioning the network. It just uses flow control, it's that simple.
As for "over provisioning", you do need large buffers if you want to have reasonable performance with high-speed flow-controlled links, don't you?
Especially if you say "overprovisioning" with the same meaning as Greg Ferro seems to like so much: FC switch vendors (and the whole storage industry) are greedy and have been stealing your money over the past 15 years just because they like to, to the point of calling users "idiots" for buying into such "bullshit". And since you linked to his article where he expresses just this position, I thought you might endorse that position.
FC switches have the memory resources they need to provide the reliability they need to provide for the criticality of the applications that run on them. They are not overprovisioned so that FC switch vendors can be rich.
Class 2 is a connectionless class of service but provides reliability by acknowledging frame delivery. It also supports end-to-end flow control by use of end-to-end credits (EE_credits) on top of the buffer-to-buffer flow control (BB_credits) on every hop. Although most (probably all) FC switch vendors have always implemented this class of service, very few end devices have ever used it. Some HBAs supported it but there were basically no storage devices that did. So if the two devices didn't support class 2, they reverted back to class 3 which ended up being the only class of service ever actually used in practice.
Some more background: http://intranet.tataelxsi.co.in/Training_Web/Articles/SSG_Articles/Fibre_Channe_Services.PDF