IPv6 and the Revenge of the Stupid Bridges

Monday, March 24, 2025 08:08 +0100… updated on Monday, March 24, 2025 12:40 +0100

IPv6 and the Revenge of the Stupid Bridges

This blog post describes another “OMG, this cannot possibly be true” scenario discovered during the netlab VRRP integration testing.

I wanted to test whether we got the nasty nuances of VRRPv3 IPv6 configuration right on all supported platforms and created a simple lab topology in which the device-under-test and an Arista cEOS container would be connected to two IPv6 networks (Arista EOS is a lovely device to use when testing a VRRP cluster because it produces JSON-formatted show vrrp printouts).

Most platforms worked as expected, but Aruba CX, Cumulus Linux with NVUE, and Dell OS10 consistently failed the tests. We were stumped until Jeroen van Bemmel discovered that the Arista container forwards IPv6 router advertisements between the two LAN segments.

Kicking the Tires

Here’s the lab topology I used. You must start the lab with netlab release older than 1.9.4 to get the described behavior (we implemented a workaround in the meantime).

module: [ gateway ]
gateway.protocol: vrrp
gateway.id: 1
addressing.lan.ipv6: 2001:db8:1::/56

nodes:
  r1: { device: arubacx }
  r2: { device: eos, provider: clab }
  h1: { device: linux, provider: clab }
  h2: { device: linux, provider: clab }

links:
- interfaces: [ r1, r2, h1 ]
  gateway: True
- interfaces: [ r1, r2, h2 ]
  gateway: True

Once the lab runs, log into the Linux hosts and inspect their IPv6 routing tables. This is what I got on H1:

h1:/# ip -6 route
2001:db8:1::/64 dev eth1  metric 256  expires 0sec
2001:db8:1:1::/64 dev eth1  metric 256  expires 0sec
fe80::/64 dev eth0  metric 256
fe80::/64 dev eth1  metric 256
default via fe80::800:901:41b:13ed dev eth1  metric 1024  expires 0sec
default via fe80::800:901:81b:13ed dev eth1  metric 1024  expires 0sec

H1 claims that both IPv6 prefixes used in the lab are directly connected. No wonder VRRPv3 does not work; the hosts don’t even try to use the first-hop routers.

What’s Going On?

It took us a while to figure out what was going on, but it’s pretty easy to demonstrate it with perfect hindsight:

Start the lab with netlab up --no-config to get the initial device configurations
Log into the Arista container and inspect its VLANs

Here’s what you’ll see on an Arista cEOS container running release 4.33.1F-39879738.4331F:

All data-plane interfaces are in VLAN 1

$ netlab connect r2
Connecting to clab-X-r2 using SSH port 22
r2>show vlan
VLAN  Name                             Status    Ports
----- -------------------------------- --------- -------------------------------
1     default                          active    Et1, Et2

Long story short: Containerlab deploys minimal initial configuration on Arista EOS containers, and netlab does the same for Arista EOS virtual machines. Neither of these configurations specifies the default behavior of new interfaces, so once they appear, they are enabled and in VLAN 1. Cisco IOSv L2 and Cisco IOL L2 images exhibit similar behavior, probably for similar reasons.

Reverse-Engineering the Mishap

Now we know what’s happening, but why did it happen only with some devices? The Ansible playbook netlab uses to configure lab devices happens to execute configuration tasks sorted by the device type, and one could observe the above behavior only if:

The tested device was configured before the EOS switch (that’s why we observed this behavior on Aruba, Cumulus Linux 5.x, and Dell OS10)
The configuration took long enough for the EOS switch to enable the access interfaces in VLAN 1 and start forwarding data between them.
The tested device started sending IPv6 RA messages before the Ansible playbook managed to configure L3 ports on the EOS switch (that’s probably why Cisco CSR and Cumulus Linux 4.x worked).

Workaround

We fixed this anomaly in netlab release 1.9.4 by adding an extra normalization configuration step executed before the initial interface configuration. This step shuts down data-plane interfaces on Arista EOS and Cisco IOS L2 devices.

This Makes Me Sad

Years ago, I tried to persuade security-minded networking engineers that we don’t need separate (physical) inside and outside switches in a DMZ because VLANs provide sufficient isolation.

While doing that, I was also stupid enough to say, “I don’t believe any decent switch vendor would start their devices as stupid bridges, directly linking inside, DMZ, and outside VLAN if the switch has no initial configuration.”

According to the comment by Keanu, that’s true on Arista EOS switches. When they boot without a startup configuration, they put interfaces in routed mode and start ZTP.

Unfortunately, Erik Auerswald claims to have had a much more disappointing experience, so the ancient recommendation is as valid as ever: don’t trust the documentation; test results are what really counts.

Revision History

2025-03-24: The observed behavior was caused by Arista EOS containers and VMs having minimal startup configuration that did not specify the default interface state.

Recent posts in the same categories

bridging

IPv6

netlab

2 comments:

Erik Auerswald 24 March 2025 11:11

Most switches I have seen through the years, when they do not have a configuration, start with all ports active and in the same VLAN (the default VLAN with ID 1). Notable exceptions were Dell OS9 and Cisco Catalyst 6500 starting with all ports disabled and in L3 mode.

Replies

Ivan Pepelnjak 24 March 2025 11:50

Thank you! I was hoping for something better, though 🤦‍♂️

Christoph 24 March 2025 03:34

... this might even be the case as a transient state during boot until the configuration has been applied, depending on specific the implementation of course. And it's not limited to switches per-se, also routers, firewalls or any other devices with "switchports" could behave like this. I remember of a situation where a customer had 2 Cisco 881 routers from its ISP connected to its LAN & additionally with a direct cross-link between each other. A restart of both routers at the same time would cause a L2 loop (BPDU guard kicked in, fortunately...) because the router's switchports would act as stupid L2 switch until the config was fully applied.

Keanu L 24 March 2025 12:03

HI Ivan,

Thanks for the article, as always!

For your information, on Arista EOS, if no startup-config is found on the switch during boot, the switch enters ZTP mode. By default, this mode sets all ports as "routed" ports to prevent the disaster you mentioned in your article.

This also applies to cEOS: when you boot a container without a startup-config or with an empty startup-config, ZTP mode will be activated.

Thank you.

Replies

Keanu L 24 March 2025 12:14

To add to my previous answer, you could also consider including the following configuration knob in your Arista cEOS containers' startup-config to ensure that all ports are set as routed ports by default:

switchport default mode routed

And if you need a switch port, you can configure it directly by setting the desired port as a switchport:

interface Ethernet1 switchport

Jeroen van Bemmel 28 March 2025 12:23

Our initial fix was to configure all ports in routed mode, but this allocates an internal vlan to each port:

r#show vlan internal usage 1006 Ethernet1 1007 Ethernet2

Some user topologies that were designed to use this VLAN range subsequently failed to deploy. We settled on disabling the ports as a first step, rather than configuring them as routed ports

Ivan Pepelnjak 24 March 2025 12:20

Thanks a million for the feedback. I (probably) know what went wrong, will fix the blog post accordingly.

Erik Auerswald 24 March 2025 05:04

I've never tried ZTP mode on an Arista switch. This seems to be another notable exception to the rule. :-)

With some other vendor, ZTP attempts to get an IP address on either the management port or on the SVI for VLAN 1 (with all front ports in VLAN 1 and active, as before the introduction of ZTP).

Yet other products come in a ZTP mode when purchased where they do not act like a simple bridge, but provide a boot flag to activate "factory defaults" with all front ports active in VLAN 1 and without ZTP.

The introduction of ZTP modes made activating a switch for the first time more interesting, since every vendor (or business unit of a vendor) seems to create a different variant. ;-)

Add comment