… updated on Monday, March 24, 2025 12:40 +0100
IPv6 and the Revenge of the Stupid Bridges
This blog post describes another “OMG, this cannot possibly be true” scenario discovered during the netlab VRRP integration testing.
I wanted to test whether we got the nasty nuances of VRRPv3 IPv6 configuration right on all supported platforms and created a simple lab topology in which the device-under-test and an Arista cEOS container would be connected to two IPv6 networks (Arista EOS is a lovely device to use when testing a VRRP cluster because it produces JSON-formatted show vrrp printouts).
Most platforms worked as expected, but Aruba CX, Cumulus Linux with NVUE, and Dell OS10 consistently failed the tests. We were stumped until Jeroen van Bemmel discovered that the Arista container forwards IPv6 router advertisements between the two LAN segments.
Kicking the Tires
Here’s the lab topology I used. You must start the lab with netlab release older than 1.9.4 to get the described behavior (we implemented a workaround in the meantime).
module: [ gateway ]
gateway.protocol: vrrp
gateway.id: 1
addressing.lan.ipv6: 2001:db8:1::/56
nodes:
r1: { device: arubacx }
r2: { device: eos, provider: clab }
h1: { device: linux, provider: clab }
h2: { device: linux, provider: clab }
links:
- interfaces: [ r1, r2, h1 ]
gateway: True
- interfaces: [ r1, r2, h2 ]
gateway: True
Once the lab runs, log into the Linux hosts and inspect their IPv6 routing tables. This is what I got on H1:
h1:/# ip -6 route
2001:db8:1::/64 dev eth1 metric 256 expires 0sec
2001:db8:1:1::/64 dev eth1 metric 256 expires 0sec
fe80::/64 dev eth0 metric 256
fe80::/64 dev eth1 metric 256
default via fe80::800:901:41b:13ed dev eth1 metric 1024 expires 0sec
default via fe80::800:901:81b:13ed dev eth1 metric 1024 expires 0sec
H1 claims that both IPv6 prefixes used in the lab are directly connected. No wonder VRRPv3 does not work; the hosts don’t even try to use the first-hop routers.
What’s Going On?
It took us a while to figure out what was going on, but it’s pretty easy to demonstrate it with perfect hindsight:
- Start the lab with
netlab up --no-config
to get the initial device configurations - Log into the Arista container and inspect its VLANs
Here’s what you’ll see on an Arista cEOS container running release 4.33.1F-39879738.4331F:
$ netlab connect r2
Connecting to clab-X-r2 using SSH port 22
r2>show vlan
VLAN Name Status Ports
----- -------------------------------- --------- -------------------------------
1 default active Et1, Et2
Long story short: Containerlab deploys minimal initial configuration on Arista EOS containers, and netlab does the same for Arista EOS virtual machines. Neither of these configurations specifies the default behavior of new interfaces, so once they appear, they are enabled and in VLAN 1. Cisco IOSv L2 and Cisco IOL L2 images exhibit similar behavior, probably for similar reasons.
Reverse-Engineering the Mishap
Now we know what’s happening, but why did it happen only with some devices? The Ansible playbook netlab uses to configure lab devices happens to execute configuration tasks sorted by the device type, and one could observe the above behavior only if:
- The tested device was configured before the EOS switch (that’s why we observed this behavior on Aruba, Cumulus Linux 5.x, and Dell OS10)
- The configuration took long enough for the EOS switch to enable the access interfaces in VLAN 1 and start forwarding data between them.
- The tested device started sending IPv6 RA messages before the Ansible playbook managed to configure L3 ports on the EOS switch (that’s probably why Cisco CSR and Cumulus Linux 4.x worked).
Workaround
We fixed this anomaly in netlab release 1.9.4 by adding an extra normalization configuration step executed before the initial interface configuration. This step shuts down data-plane interfaces on Arista EOS and Cisco IOS L2 devices.
This Makes Me Sad
Years ago, I tried to persuade security-minded networking engineers that we don’t need separate (physical) inside and outside switches in a DMZ because VLANs provide sufficient isolation.
While doing that, I was also stupid enough to say, “I don’t believe any decent switch vendor would start their devices as stupid bridges, directly linking inside, DMZ, and outside VLAN if the switch has no initial configuration.”
According to the comment by Keanu, that’s true on Arista EOS switches. When they boot without a startup configuration, they put interfaces in routed mode and start ZTP.
Unfortunately, Erik Auerswald claims to have had a much more disappointing experience, so the ancient recommendation is as valid as ever: don’t trust the documentation; test results are what really counts.
Revision History
- 2025-03-24
- The observed behavior was caused by Arista EOS containers and VMs having minimal startup configuration that did not specify the default interface state.
Most switches I have seen through the years, when they do not have a configuration, start with all ports active and in the same VLAN (the default VLAN with ID 1). Notable exceptions were Dell OS9 and Cisco Catalyst 6500 starting with all ports disabled and in L3 mode.
Thank you! I was hoping for something better, though 🤦♂️
... this might even be the case as a transient state during boot until the configuration has been applied, depending on specific the implementation of course. And it's not limited to switches per-se, also routers, firewalls or any other devices with "switchports" could behave like this. I remember of a situation where a customer had 2 Cisco 881 routers from its ISP connected to its LAN & additionally with a direct cross-link between each other. A restart of both routers at the same time would cause a L2 loop (BPDU guard kicked in, fortunately...) because the router's switchports would act as stupid L2 switch until the config was fully applied.
HI Ivan,
Thanks for the article, as always!
For your information, on Arista EOS, if no startup-config is found on the switch during boot, the switch enters ZTP mode. By default, this mode sets all ports as "routed" ports to prevent the disaster you mentioned in your article.
This also applies to cEOS: when you boot a container without a startup-config or with an empty startup-config, ZTP mode will be activated.
Thank you.
To add to my previous answer, you could also consider including the following configuration knob in your Arista cEOS containers' startup-config to ensure that all ports are set as routed ports by default:
switchport default mode routed
And if you need a switch port, you can configure it directly by setting the desired port as a switchport:
interface Ethernet1 switchport
Thanks a million for the feedback. I (probably) know what went wrong, will fix the blog post accordingly.
I've never tried ZTP mode on an Arista switch. This seems to be another notable exception to the rule. :-)
With some other vendor, ZTP attempts to get an IP address on either the management port or on the SVI for VLAN 1 (with all front ports in VLAN 1 and active, as before the introduction of ZTP).
Yet other products come in a ZTP mode when purchased where they do not act like a simple bridge, but provide a boot flag to activate "factory defaults" with all front ports active in VLAN 1 and without ZTP.
The introduction of ZTP modes made activating a switch for the first time more interesting, since every vendor (or business unit of a vendor) seems to create a different variant. ;-)