… updated on Wednesday, March 9, 2022 07:44 UTC
Data Plane Quirks in Virtual Network Devices
Have you noticed an interesting twist in the ICMP Redirects saga: operating systems of some network devices might install redirect entries and use them for control plane traffic – an interesting implementation side effect of the architecture of most modern network devices.
A large majority of network devices run on some variant of Linux or *BSD operating system, the only true exception being ancient operating systems like Cisco IOS1. The network daemons populate various routing protocol tables and compute the best routes that somehow get merged into a single routing table that might still be just a data structure in some user-mode process.
What happens next depends on the network operating system implementation (you might watch the excellent Network Operating System Models webinar for more details). Some solutions (example: Cumulus Linux) copy the computed forwarding table (FIB) into the Linux routing table and use the Linux routing table as the source-of-truth to set up the ASIC (hardware forwarding) tables. Other solutions (example: Arista EOS) program the ASIC directly and insert just enough information into the Linux routing table to make the control plane work. There’s nothing wrong with either approach… until you’re trying to figure out how your network is going to behave by setting up its digital twin2 in a virtual lab.
Here’s what you might experience when doing that:
- Load balancers and firewalls are usually using software-based packet forwarding. There’s usually no difference between running them in a sheet metal envelope or as a VM.
- Low-end routers with CPU-based packet forwarding should behave the same way when ran as a VM… unless they’re using hardware offload in which case you might experience a difference between hardware- and software implementation. I found absolutely no difference between Cisco IOS running on a low-end router and Cisco IOS running as a VM, and CSR 1000v was designed to be used as a VM anyway. Would CSR 1000v behave in exactly the same way as ISR 4000 routers? Probably not.
- IOS XR has two simulation images, one focusing on control plane functionality (route reflector), the other one with a full data plane that is nonetheless significantly different from the hardware implementation (source). I still have no idea what Juniper vMX is doing apart from the obvious fact that it’s implemented as two virtual machines (control-plane VM and data-plane VM). Insightful comments are most welcome.
- Data center switches ran as virtual devices usually use Linux kernel for packet forwarding – after all, these virtual machines are not meant to be a replacement for the actual switches. Some implementations might behave similarly to their physical counterparts, others show significant deviation from what you might expect. For example, I couldn’t get ECMP load balancing to work with vEOS or cEOS.
Don’t get me wrong: there’s nothing wrong with virtual machines not implementing all the intricacies of the hardware data plane when they are supposed to be used for control-plane tests or validation… but you have to understand the limitations. You cannot expect to be able to fully validate network operation after a configuration change in a virtual lab if you cannot emulate all data plane functionality.
Containers Are Even More Interesting
Some vendors provide virtual versions of their network operating systems in container format. You SHOULD NOT use them for anything more than control-plane functionality. Here are just a few minor details we found so far:
- The last time I tested Juniper cRPD it couldn’t report its interfaces, making it impossible to use Ansible to configure it.
- Arista cEOS does not seem to have a working MPLS data plane. ping mpls and traceroute mpls work across a network of vEOS (VM) devices, but not cEOS containers (HT: Bogdan Golab).
- It’s pretty much impossible to load a kernel module within a container. For example, Cumulus implementation of MLAG does not work in a container due to a custom bridge kernel module. Michael Kashin solved that problem by packing a Firecracker VM into a container. While that approach solves the data plane issues, it loses most benefits we could get from containerized network devices like a single copy of Linux or shared code/memory.
- Added a few container quirks.
- Added the details provided by Béla Várkonyi
Cisco IOS XE is an interesting hybrid running Cisco IOS control plane as a single process on top of the Linux kernel. Packet forwarding is implemented in dedicated CPU cores (ISR 4000, CSR 1000v) or in ASICs (ASR 900, Catalyst). ↩︎
I know that’s another bullshit-bingo-winning nonsense, but you have to admit it sounds cool ;) ↩︎
The single IOS process in IOS XE is not the same as the old IOS. This is only a subset, since all data plane forwarding functions are implemented separately with dedicated CPU cores in ISR 4000, but with ASICs in ASR900, Cat9000 or Cat8000. The original all-software IOS does not support such an architecture. A lot of things in IOS XE are also platform specific.
Because of that, an IOS XE simulation using a VM might be far away from real behavior. Whenever you have a platform specific limitation, optimization, bugs, etc. the simulation would not be the same as the real device.
Simulation is a good starting point, but not a proper validation even for you functionality. Real hardware might give you surprises...
For IOS XR you have two simulation images. One is focusing on control plane and designed as an RR. The other IOS XR VM has a full data plane, but it is again significantly different from the real hardware implementation. Especially, that most IOS XR devices have a distributed CPU / line card architecture.
Here again, you have tons of hardware platform specific features, limitations, bugs, etc. Those you cannot experience in a simulation...
In a large Cisco shop, you might use a simulation for proof-of-concept and have a real hardware lab for acceptence test, problem management, troubleshooting. The real hardware lab is more difficult to access for many projects competing for resources, so it still makes sense to make your first checks in simulation, but it is not enough before you want to deploy something at the customer.
You were wondering about the data plane VM within Juniper's VMX. The second edition of Harry Reynolds book "Juniper MX Series" has a short chapter on this.
The lookup portion of the Trio chipset is virtualized within the data plane VM. This is said to be the same microcode that they use on the physical ASIC just compiled for Intel x86 instead. The queueing features are rewritten using Intel DPDK.
Thanks a million!