Path MTU Discovery Doesn’t Work with IP Multicast

A friend of mine sent me an interesting problem:

I noticed recently that my IOS routers aren't sending ICMP (unreachable; frag needed) messages in response to too-big IPv4 multicast packets with DF-bit set. They're just dropping these packets silently, breaking PMTUD.

Unfortunately, that’s not a bug but a FAD (Functions-as-Designed).

A quick Google search found this document which pointed me to section 7.2 of RFC 1112 (yeah, multicast is really THAT old):

An ICMP error message (Destination Unreachable, Time Exceeded, Parameter Problem, Source Quench, or Redirect) is never generated in response to a datagram destined to an IP host group.

The same document also describes why RFC 1112 prohibits sending ICMP error messages in response to multicast datagrams. The processing done on ICMP error replies by the *nix socket API might block the sender socket if an error comes back from a single receiver or if TTL expires when traversing a particularly long branch of the multicast tree – not exactly a good idea in multicast environment.

Lessons learned:

  • You should never get ICMP error messages in response to IP multicast packets;
  • Path MTU discovery doesn’t work with IP multicast;
  • Sending multicast packets with DF bit set is a bad idea unless you’re OK with some receivers never getting them;
  • ICMP echo reply to a multicast echo request is perfectly legal (because it’s not an ICMP error message).

5 comments:

  1. Saw this the other day. Interesting topic.

    My multicast applications are mostly low rate and small packet size. Do you have any idea of average packet size for multicast applications such as IPTV? Or what kind of applications would this be a problem in?
  2. In our environment we prefer having DF set on IPTV multicast stream mainly for two reasons:

    - bad configured intermediate device might process switching due to fragmentation and reassembly and we'd like to avoid high cpu issues or latency on real time flows.

    - any MTU config mistakes can be easily discovered (TV flow is not received),
  3. A number of stock exchanges use a protocol called FAST for market data delivering via multicast (FIX protocol extension). It is used to support high-throughput, low latency data communications between financial institutions. It's worth reading this short document: http://www.fixtradingcommunity.org/pg/file/fplpo/read/30528/multicast-recommended-practices

    The protocol implements simple mechanisms to handle duplicate data, out-of-order and lost segments.

    -
  4. Linux sets DF on all outbound traffic (including multicast) by default unless the application specifies otherwise. I think this is a flaw, given that PMTUD is impossible for IPv4 multicast.

    Personally, I think that v4 got it wrong, and v6 got it right. PMTUD for multicast should be possible.

    The concerns about too many unreachables (including when elicited by packets from spoofed sources) don't really resonate with me compared to the ugliness of needlessly fragmenting traffic. At the very least, this behavior should be configurable so that it can be used where appropriate.
  5. I usually hear it referred to as "WAD", working as designed.
Add comment
Sidebar