How Did We End with 1500-byte MTU?
A subscriber sent me this intriguing question:
Is it not theoretically possible for Ethernet frames to be 64k long if ASIC vendors simply bothered or decided to design/make chipsets that supported it? How did we end up in the 1.5k neighborhood? In whose best interest did this happen?
Remember that Ethernet started as a shared-cable 10 Mbps technology. Transmitting a 64k frame on that technology would take approximately 50 msec (or as long as getting from East Coast to West Coast). Also, Ethernet had no tight media access control like Token Ring, so it would be possible for a single host to transmit multiple frames without anyone else getting airtime, resulting in unacceptable delays.
Next, eventually you have to send the traffic to WAN links. Either you do fragmentation on WAN edge routers (bad) or you have to limit frame size on LAN to something reasonable.
There’s also the bit error rate. The probability of receiving a 1.5K frame without errors is approximately 40x (64/1.5 to be precise) higher than the probability of receiving 64K frame without errors. I’m oversimplifying, and should really use the inverse probability to N-th power (watch the awesome Reliability Theory webinar for more details), but with low-enough error rate I’m not far off. For even more details see the excellent comment by Innokentiy below.
Finally, there’s the buffering problem. Without the hardware capability to split incoming packets into smaller chunks (call them cells, fragments, segments, or whatever else you want) every buffer has to be as large as the largest frame. Not a good idea if you have 1MB of memory in a mini-mainframe supporting 30 interactive users like the VAX 780 I used in early 1980s.
After the original thick-cable Ethernet became a great hit someone within IEEE got the wonderful idea that whatever they do afterwards has to be compatible with that initial implementation. That’s why we still have 1500-byte MTU and half-duplex Gigabit Ethernet messing up P2P links ;))... and Innokentiy explained in his comment why we never got past 9K MTU.
Interested in more details like these? Check out the How Networks Really Work webinar! Several videos are already available with free subscription; you need at least Standard ipSpace.net Subscription to watch the whole webinar.
As per IEEE 802, clause 6.2, for wired physical media, the probability that an MAC service data unit (MSDU) delivered at an MSAP contains an undetected error, due to operation of the MAC service provider, shall be less than 5×10^-14 per octet of MSDU length. CRC32 function used in Ethernet for calculating FCS field gives accurate error detection with frames up to 91639 bits (with minimum Hamming distance between any two valid frames of 4), and 3 or less bit errors. Given the bit error rate/probability, the desired damaged frame detection probability for a frame of certain length, and possible frame lengths, the calculated optimum frame length is around that famous 1500.
For example, the worst-case probability of losing a 65536-octet frame is to be less than 1.21×10^-4, or approximately 1 in 8250. The worst-case probability that a similar frame, which contains an MSDU of 1500 octets, is delivered with an undetected error is to be less than 7.5×10^-11, or approximately 1 in 13 300 000 000.
Interestingly, there is huge loss of CRC32 error detection efficiency with frames longer than 91640 bits. This is why most vendors allowed jumbo frames only up to 9k, even before it was standartized.
B) Larger frames rarely improve performance (assuming decent networking stack and NIC)... even in storage deployments.
So we ended in "just good enough" territory and the industry moved on to chase the next unicorn.