VM-level IP Multicast over VXLAN

Dumlu Timuralp (@dumlutimuralp) sent me an excellent question:

I always get confused when thinking about IP multicast traffic over VXLAN tunnels. Since VXLAN already uses a Multicast Group for layer-2 flooding, I guess all VTEPs would have to receive the multicast traffic from a VM, as it appears as L2 multicast. Am I missing something?

Short answer: no, you’re absolutely right. IP multicast over VXLAN is clearly suboptimal.

In the good old days when the hypervisor switches were truly dumb and used simple VLAN-based layer-2 switching, you could control the propagation of IP multicast traffic by deploying IGMP snooping on layer-2 switches (or, if you had Nexus 1000V, you could configure IGMP snooping directly on the hypervisor switch).

Those days are gone (finally), but the brave new world still lacks a few features. No ToR switches are currently capable of digging into the VXLAN payload to find IGMP queries and joins, and it’s questionable whether Nexus 1000V can do IGMP snooping over VXLAN (IGMP snooping on Nexus 1000V is configured on VLANs).

End result: IP multicast running across a VXLAN segment will be delivered to all VMs in the same segment. Both hypervisor switches and VMs will to have spend CPU cycles to process unwanted multicast packets.

3 comments:

  1. Hi Ivan,

    I don't know much about multicast, but it appears that defeats the purpose of multicast when used with VXLAN - that only certain endpoints would receive multicast traffic to which they subscribe in the first place.

    Cheers,

    Mike
    Replies
    1. Mike,
      The problem is that, as far as my understanding is, the physical switches have no visibility into the underlying VXLAN encapsulation. It's a MAC-in-IP scheme. The "certain endpoints" are in their own VXLAN-encap'd VLAN, which is then encapsulated in plain old IP. The physical switches only ever seen IP traffic and just forward/route it to whichever VM host the traffic is going to. Thus no IGMP snooping can happen at the physical network and needs to happen in the virtual switches that have visibility to the VM's virtualized ports. It's essentially MAC-in-IP-over-Ethernet/L2

      This is just my basic understanding, from Ivan's blog and Packet Pushers (Show 66). I'm sure Ivan (and others) can shed some additional details on this.
  2. if VTEP can scale, VxLAN can always borrow MDT concept from MBGP, so only related VTEP will join mcast tree to avoid over-delivered-issue.
Add comment
Sidebar