netlab release 1.5.1 makes it easier to create topologies with lots of VRF- or VLAN access links, or topologies with numerous similar links. It also includes support for D2 diagram scripting language in case you prefer its diagrams over those generated by Graphviz.
Even if you don’t find those features interesting (more about them later), you might want to upgrade to fix a nasty container-related behavior I discovered in recently-upgraded Ubuntu servers.
DHCP Clients Spoil the Fun
netlab always used Cumulus containers created by Michael Kashin to test containerlab installation. They are also a pleasure to work with due to very low startup time, but when I tried to deploy a topology using them a few days ago, the initial configuration process crashed complaining about the another instance already running.
Fortunately, the bug was easy to reproduce, and after adding copious amounts of logging to the initialization script I was able to figure out it crashed when trying to run
Next step: figure out which processes are running at that time. I quickly found numerous
firstboot processes – the scripts Cumulus Linux executes at the boot time. No big deal; let’s wait for them to complete.
It worked… but it increased the container start time by over a minute. Not exactly what I was looking for. However, during a more detailed investigation I found a bunch of
dhcpclient processes, and as containerlab assigns static IP addresses to containers it made no sense to have them running. What if I’d kill them?
Bingo! The initialization script worked and ran at a decent speed. It’s a bit slower as it has to wait for the
firstboot procedure to go through it motions and then kill the
dhcpclient processes (otherwise they would interfere with later steps of the device configuration), but the additional delay could be measured in seconds. That’s good enough for me, obviously you’re most welcome to create a better fix and submit a pull request ;)
Obviously I wanted to know what change caused that problem. DHCP clients were always started in the Cumulus Linux containers (the container image I’m using hasn’t changed in months). The only reasonable explanation I could find was therefore that something broke (or disabled) a DHCP server running somewhere behind the scenes, and Cumulus DHCP clients blocked the container interface initialization scripts until they timed out (which happens to take a minute).
Unfortunately I have no idea what caused that change in behavior; if you happen to know more please leave a comment!
To get more details and learn about additional features included in release 1.5.1, read the release notes. To upgrade, execute
pip3 install --upgrade networklab.