AMS-IX Outage: Layer-2 Strikes Again

Thursday, December 7, 2023 08:18 UTC

AMS-IX Outage: Layer-2 Strikes Again

On November 22nd, 2023, AMS-IX, one of the largest Internet exchanges in Europe, experienced a significant performance drop lasting more than four hours. While its peak performance is around 10 Tbps, it dropped to about 2.1 Tbps during the outage.

AMS-IX published a very sanitized and diplomatic post-mortem incident summary in which they explained the outage was caused by LACP leakage. That phrase should be a red flag, but let’s dig deeper into the details.

Reading the incident report, it seems to me (and I would love to be corrected) that Juniper switches used by AMS-IX forward LACP packets received on a non-LAG port to other bridged¹ ports on the same switch. As much as I’m trying, I can’t figure out in which universe that would be anywhere close to a sane choice.

LACP (Link Aggregation Control Protocol) was designed to be used between adjacent devices. While I know we have to live with abominations like two devices pretending they’re a single system, I lack polite words to describe the idea of forwarding layer-2 control packets that are supposed to be used between adjacent devices onto other links. Unfortunately, I’m also aware of potential MacGyver-type use cases for that monstrosity: let’s buy two Carrier Ethernet links and pretend we can bundle them into an end-to-end Link Aggregation Group.

However, even if the vendor account teams dazzled by a humongous purchase order can get persuaded that a bridge needs such a dangerous nerd knob, one would hope that configuring it would be hard and would generate all sorts of “if you do this, the universe might collapse into a black hole” type of warnings²; one can only hope flooding packets sent to well-known IEEE-defined MAC addresses is not the default behavior, but then boxes from the same company happily talk BGP with total strangers. Feedback from anyone familiar with Junos layer-2 implementation would be most welcome.

Let’s stop pretending: a layer-2 switch is really an IEEE bridge with too many nerd knobs that can evidently cause significant harm. ↩︎
If that’s the case on Juniper boxes, then AMS-IX got what they had asked for. ↩︎

switching

5 comments:

dreamer 07 December 2023 12:14

For anyone who's interested, there was an in-depth presentation at RIPE 87 on the topic: https://ripe87.ripe.net/archives/video/1263/

Erik Auerswald 07 December 2023 04:32

After watching the video (thanks dreamer!) and looking at the slides, I am not sure which of the three mentioned device families (Brocade¹ MLXe-16/32, Extreme SLX9850, and Juniper MX10k8) introduced LACP frames into the VPLS network.

MPLS packets should be forwarded based on the label, not on the octets after the label stack, i.e., MPLS encapsulated LACP frames should be forwarded identically to all other MPLS packets inside the MPLS network. I would not expect LACP to ingress a VPLS network by default (many vendors, e.g., Juniper, use Layer 2 Protocol Tunneling (L2PT) to allow transport of such protocols after specific configuration; Cisco called this Generic Bridge PDU Tunneling (GBPT)).

¹ The Brocade MLX routers are part of what Extreme Networks purchased from the Brocade parts Broadcom did not want to keep, as are the SLX routers.

Replies

Erik H 07 December 2023 06:30

I'm just wondering if it is sane to build such a vital piece of infrastructure with three completely different vendors. Why even consider it, given all the interop hassle.

Erik Auerswald 08 December 2023 12:37

Well, it's two vendors: both Brocade MLXe and Extreme SLX are from a single vendor (Extreme Networks). The SLX is a possible replacement for the older MLXe.

Ivan Pepelnjak 08 December 2023 12:04

Two words (actually three): pricing and organic growth. It's amazing how little money people (= service providers) are willing to put into mission-critical infrastructure like IXPs (or OpenSSL maintenance 🤪)

https://xkcd.com/2347/ comes to mind 🤷‍♂️

Lindsay Hill 08 December 2023 11:58

As above, it's only two vendors, and for many years it was only Brocade/Extreme. It's only this year they turned up Juniper devices. They had to do something about 100G/400G, and changing vendors was a reasonable choice. https://www.juniper.net/us/en/customers/ams-ix-case-study.html (published 2022, but it's only this year that Juniper equipment has gone into production there)

Sticking with one vendor forever because you're scared of interop gives you no negotiating leverage, and may mean you're stuck - e.g. I don't think Extreme Networks is shipping any 400G gear, so what do you do? But even if you've decided to completely swap to Juniper, you can't do that in one step. You have to have some interoperability for a while.

I don't think the XKCD thing is relevant here. It's not a case of relying on a forgotten utility maintained by a volunteer, this is AMS-IX's core product. It is everything they do. And it's not a case of service providers being unwilling to put money into IXPs. AMS-IX is not a charity running on donations. See https://peering.exposed. The SPs and CDNs who are AMS-IX customers are paying a decent amount.

I also don't think that AMS-IX is quite so mission-critical any more. Yes, it does a fair bit of traffic, but there are multiple other IXPs in Amsterdam. If you really care about your connectivity, you are connected to at least two of those, in addition to a mesh of PNIs and transit connections.

(Background: my employer peers at many locations, including AMS-IX. I do not speak for them, etc etc)

Anonymous 09 December 2023 11:15

After watching the video I came to the conclusion that they haven't tested MAC ACL, link failure convergence and interop. Just a disaster waiting to happen. Thorough testing is mandatory in such a setup (maybe they thought they could get through that migration phase without too much effort). For me there are no excuses. It looks to me like poor operational practices. Should not sugarcoat it and beg for understanding.

Erik Auerswald 10 December 2023 02:00

It seems to me as if the AMS-IX incident report has been improved by adding more details regarding which problems occurred with which router product¹, as can be seen via wayback machine. Naming the problematic router products helps in evaluating whether other networks might be affected by these specific problems. Thanks to AMS-IX!

I think this report is a good example for a public blame free postmortem. It shows competence and both the ability and intention to learn from problems. Such a report allows others (including me) to learn. This is great! Again, thanks to AMS-IX!

In this case, an issue in the L2 overlay (LACP frames inadvertently transported via VPLS) affected the stability of the L3 underlay (MPLS with OSPF, LDP, and RSVP-TE), i.e., a cascading failure brought down the whole peering LAN. This provides an example where an L3 underlay did not result in better stability for an L2 domain.

¹ I remember not seeing specific router products for specific problems when I first read this report. Thus my initial comment that I was not sure which router product forwarded LACP frames.

Sebastian Schrader 19 December 2023 07:39

VMware DVS to this day is also happily forwarding most control traffic (including LACP), which should be filtered by 802.1D bridges (all traffic to MAC 01:80:c2:00:00:0X). They decided to implement a nerd knob to block STP (Net.BlockGuestBPDU) specifically (which is also off by default).

Replies

Ivan Pepelnjak 20 December 2023 09:34

Networking in VMware ESX has a long history of suboptimal design decisions (see how diplomatic I'm trying to be these days), but then, at least, they never claimed DVS is an 802.1D-compliant bridge.

I documented the STP SNAFU a long while ago in https://blog.ipspace.net/2012/09/dear-vmware-bpdu-filter-bpdu-guard.html, and I can't pretend that I'm surprised to hear they don't filter other L2 control traffic.

Thanks for the update! Ivan

Add comment

Recent posts in the same categories

switching

5 comments: