When I finally1 managed to get SR Linux running with netsim-tools, I wanted to test how it interacts with Cumulus VX and FRR in an OSPF+BGP lab… and failed. Jeroen Van Bemmel quickly identified the culprit: MTU. Yeah, it’s always the MTU (or DNS, or BGP).
I never experienced a similar problem, so of course I had to identify the root cause:
- There is no standard mechanism to pass network MTU to physical Ethernet NICs. Most everyone therefore uses 1500 bytes as the default MTU2.
- Traditional network operating systems (Cisco IOS) have their opinions baked into the source code.
- Linux-based virtual devices might inherit the MTU from whatever Linux kernel thinks is a good idea.
- While the emulated “traditional” NICs (example: the venerable E1000 NIC used by Nexus OS) have no opinion about the correct MTU size, paravirtual drivers (at least virtio – see feature bits) have an option of passing the MTU size from the hypervisor to the guest device driver (and thus the operating system), and libvirt has an option of setting MTU size in VM definition file (domain XML file). Is anyone using it? Is it possible to configure VM MTU size that way? I have no idea, comments welcome.
- Container-based solutions inherit the interfaces from the container orchestration system. Those interfaces have whatever MTU the container orchestration system found appropriate.
In our particular case, Cumulus VX and FRR running in containers had MTU set to 9500 while SR Linux had its MTU set to 1500 – Jeroen had to configure lower MTU to get SR Linux to work with SR OS.
Next question: why would SR Linux and SR OS have a different default MTU sizes? SR Linux is a real container while SR OS is not available in container format (as of January 2022); what you’re running in containerlab is really a VM packaged in a container file together with QEMU (see vrnetlab for details). As a VM, SR OS uses the Ethernet default MTU size (1500) while SR Linux uses whatever the container orchestration system (containerlab in my case) sets us, and it happens to be 9500.
Finally, where did the high default MTU come from? It’s not Linux default, and it’s definitely not set that way in other container orchestration systems like Docker. Turns out that containerlab sets the MTU to 9500 on vEth links. Mystery solved ;)
So how did we get Cumulus VX and SR Linux to work together? We had to implement MTU parameter (on interface, link, node, and lab level) in netsim-tools, and set the default to 1500 for container-based network devices. The fix is available in release 1.1.2; until we push it out use
pip3 install --upgrade 'netsim-tools>=1.1.2.dev' to grab a snapshot of the development branch.
- The virtual networking fundamentals are covered in Introduction to Virtualized Networking webinar.
- To learn how containers work behind the scenes, watch Introduction to Docker and Docker Networking Deep Dive webinars. They do focus on Docker, but they also explain the namespace concepts and virtual Ethernet links used by almost any container orchestration system.
- Ready for a large glob of complexity? Enjoy the Kubernetes Networking Deep Dive.