Our network topology will have two switches and five hosts, some connected to a single switch. That’s not a good idea in an MLAG environment, but even if you have a picture-perfect design with everything redundantly connected, you will have to deal with it after a single link failure.
Imagine host A sending a broadcast (or any other frame that needs to be flooded). It could use the A-S1 or the A-S2 link to do so. Let’s assume it uses the A-S1 link. The broadcast has to be received by every other host connected to the same broadcast domain. Getting it to X and B is simple – S1 sends it over a directly connected link.
What about C and Y? Only S2 can deliver the broadcast to them – S1 has to send all flooded frames to S2 over the peer link. So far, so good, but now we’re hitting a snag: a packet S2 receives over the peer link should not be flooded to hosts with active connections to both switches2. S2 should forward the broadcast to C and Y, but not to A or B. In other words: S2 must implement a split horizon between single-attached and dual-attached hosts.
Let’s assume our switches use a MAC forwarding table with an extra default entry that contains the list of all ports to which they should flood a BUM frame.3 That works great for standalone switches, but we need two default entries for switches operating in an MLAG cluster:
- A default entry for packets received from connected network devices. S1 would forward such packets to X, A, B, and S24; S2 would deliver them to A, B, C, Y, and S1.
- Another default entry for packets received over the peer link. S1 would forward them to X but not to A or B. Likewise, S2 would send them to C and Y, but not A or B.
These requirements are not too different from what we need to implement stackable switches, so we can expect some forwarding ASICs to contain mechanisms to deal with them. The ASIC could select one or the other default entry based on the input port, or use metadata attached to the incoming frame5 if the MLAG implementation is using additional (proprietary) encapsulation on the peer link6.
The real fun starts when we try to replace the peer link with fabric connection using MPLS or VXLAN encapsulation. MPLS label stack is an obvious solution7: RFC 7432 contains a lengthy convoluted section explaining how to advertise and use an additional MPLS label to implement split horizon switching.
There’s no extra label in VXLAN, and the only solution (detailed in RFC 8365) is to use the source IP address of the VXLAN packets to figure out whether the packet is coming from the virtual peer link. That must be tremendous fun to program in forwarding hardware judging from the catastrophic bugs that riddled early EVPN-only MLAG implementations.
Back to those two default entries. The switch ASIC could support them, our you could use two MAC tables (one for regular ports, one for peer link) with a lot of VLAN magic, but there’s a much more creative solution: use ACL on outgoing port. All you need is a high-priority entry on all LAG ports saying “if the packet came from the peer link, drop it.”
In our example, the switch software would configure an entry saying “drop a packet if it came over the S1-S2 link” on links to A and B, but not on links to X, Y, and C – all those switches are connected to a single MLAG member, and have to receive flooded packets arriving through the peer link.
- Added the outgoing ACL solution based on lengthy chat with Dinesh Dutt. Dinesh, thanks a million!
BUM: Broadcasts, Unknown Unicast, Multicasts. ↩︎
Things get interestingly complex if we have more than two switches in an MLAG cluster. I’ll leave the gory details as an exercise for the reader. ↩︎
In theory, a single default entry is good enough to implement BUM flooding unless you want to implement IGMP snooping. ↩︎
Following the usual IEEE 802.1 split-horizon rule: never forward a frame to the port from which it has been received. ↩︎
A flag saying, “this frame is coming from a peer link, please flood it to everyone that’s not connected to me.” ↩︎
According to Dinesh Dutt, most modern MLAG implementations use standard Ethernet encapsulation on the peer link. ↩︎
All problems in networking that are worth solving can be solved with one or more additional MPLS labels 😜 ↩︎