Blog Posts in June 2019
It’s high time for another summer break (I get closer and closer to burnout every year - either I’m working too hard or I’m getting older ;).
Of course we’ll do our best to reply to support (and sales ;) requests, but it might take us a bit longer than usual. I will publish an occasional worth reading or watch out blog post, but don’t expect anything deeply technical for the new two months.
In the meantime, try to get away from work (hint: automating stuff sometimes helps ;), turn off the Internet, and enjoy a few days in your favorite spot with your loved ones!
Daniel Teycheney attended the Spring 2019 Building Network Automation Solutions online course and sent me this feedback after completing it (and creating some interesting real-life solutions on the way):
I spent a bit of time the other day reflecting on how much I’ve learn’t from the course in terms of technical skills and the amount I’ve learned has been great. I literally no idea about things like Git, Jinja2, CI testing, reading YAML files and had only briefly seen Ansible before.
I’m not an expert now, but I understand these things and have real practical experience on these subjects which has given me great confidence to push on and keep getting better.
When I was still at university the fourth-generation programming languages were all the hype, prompting us to make jokes along the lines “fifth generation will implement do what I don’t know how”
Christoph Jaggi sent me this observation during one of our SD-WAN discussions:
The centralized controller is another shortcoming of SD-WAN that hasn’t been really addressed yet. In a global WAN it can and does happen that a region might be cut off due to a cut cable or an attack. Without connection to the central SD-WAN controller the part that is cut off cannot even communicate within itself as there is no control plane…
A controller (or management/provisioning) system is obviously the central point of failure in any network, but we have to go beyond that and ask a simple question: “What happens when the controller cluster fails and/or when nodes lose connectivity to the controller?”
SD-WAN is the best thing that could have happened to networking according to some industry “thought leaders” and $vendor marketers… but it seems there might be a tiny little gap between their rosy picture and reality.
This is what I got from someone blessed with hands-on SD-WAN experience:
One of my readers sent me this question:
How can I learn more about reading REST API information from network devices and storing the data into tables?
Long story short: it’s like learning how to drive (well) - you have to master multiple seemingly-unrelated tasks to get the job done.
One of the first things I realized when I started my Azure journey was that the Azure orchestration system is incredibly slow. For example, it takes almost 40 seconds to display six routes from per-VNIC routing table. Imagine trying to troubleshoot a problem and having to cope with 30-second delay on every single SHOW command. Cisco IGS/R was faster than that.
If you’re old enough you might remember working with VT100 terminals (or an equivalent) connected to 300 baud modems… where typing too fast risked getting the output out-of-sync resulting in painful screen repaints (here’s an exercise for the youngsters: how long does it take to redraw an 80x24 character screen over a 300 bps connection?). That’s exactly how I felt using Azure CLI - the slow responses I was getting were severely hampering my productivity.
I always love to hear from networking engineers who managed to start their network automation journey. Here’s what one of them wrote after watching Ansible for Networking Engineers webinar (part of paid ipSpace.net subscription, also available as an online course).
This webinar helped me a lot in understanding Ansible and the benefits we can gain. It is a big area to grasp for a non-coder and this webinar was exactly what I needed to get started (in a lab), including a lot of tips and tricks and how to think. It was more fun than I expected so started with Python just to get a better grasp of programing and Jinja.
In early 2019 we made the webinar even better with a series of live sessions covering new features added to recent Ansible releases, from core features (loops) to networking plugins and new declarative intent modules.
One of my subscribers sent me an interesting puzzle:
One of my colleagues configured a single-area OSPF process in a customer VRF customer, but instead of using area 0, he used area 123 nssa. Obviously it works, but I was thinking: “What the heck, a single OSPF area MUST be in Area 0”
Not really. OSPF behaves identically within an area (modulo stub/NSSA behavior) regardless of the area number. Even there, you could argue that the only difference between area 0 and other areas is that the standard (and all compliant implementations) doesn’t allow you to set stub or NSSA bit in area 0.
In my quest to understand how much buffer space we really need in high-speed switches I encountered an interesting phenomenon: we no longer have the gut feeling of what makes sense, sometimes going as far as assuming that 16 MB (or 32MB) of buffer space per 10GE/25GE data center ToR switch is another $vendor shenanigan focused on cutting cost. Time for another set of Fermi estimates.
Let’s take a recent data center switch using Trident II+ chipset and having 16 MB of buffer space (source: awesome packet buffers page by Jim Warner). Most of switches using this chipset have 48 10GE ports and 4-6 uplinks (40GE or 100GE).
For example, if we have multiple devices connected to the same subnet, why should we have to specify IP address and subnet mask for every device (literally begging the operators to make input errors). Wouldn’t it be better (assuming we don’t care about exact IP addresses on core links) to assign IP addresses automatically?
The No Scripting Required to Start Your Automation Journey blog post generated lively discussions (and a bit of trolling from the anonymous peanut gallery). One of the threads focused on “how does automation work in real life IT department where it might be challenging to simplify operations before automating them due to many exceptions, legacy support…”
Here’s a great answer provided by another reader:
Roy Chua (SDx Central) published a blog post titled “Where Have All the SDN Controllers Gone” a while ago describing the gradual disappearance of SDN controller hype.
No surprise there - some of us were pointing out the gap between marketing and reality years ago.
It was evident to anyone familiar with how networking actually works that in a generic environment the drawbacks of orthodox centralized control plane SDN approach far outweigh its benefits. There are special use cases like intelligent patch panels where a centralized control plane makes sense.
This blog post was initially sent to subscribers of my SDN and Network Automation mailing list. Subscribe here.
At the end of my vNIC 2018 keynote speech I made a statement along these lines:
The moment you start using GUI with an SDN product you’re back to square one.
That claim confused a few people – Mark left this comment on my blog:
Approximately two years ago I tried to figure out whether aggressive marketing of deep buffer data center switches makes sense, recorded a few podcasts on the topic and organized a webinar with JR Rivers.
Not surprisingly, the question keeps popping up, so it seems it’s time for another series of TL&DR articles. Let’s start with the basics:
Remember the avoid duplicate data in network automation data models challenge and the restructuring we did to represent a network as a graph.
Well, I was not happy with the end result - I hated the complexity of supporting Jinja2 templates that had to check left- and right nodes of a link, so I generalized the data structure a bit, and all of a sudden I could model stub interfaces, P2P links and multi-access networks.
A while ago I had an interesting consulting engagement: a multinational organization wanted to migrate off global Carrier Ethernet VPN (with routers at the edges) to MPLS/VPN.
While that sounds like the right thing to do (after all, L3 must be better than L2, right?) in that particular case they wanted to combine the provider VPN with Internet-based IPsec VPN… and doing that in parallel with MPLS/VPN tends to become an interesting exercise in “how convoluted can I make my design before I give up and migrate to BGP”.
Found a fantastic list of common logical fallacies. It's a must read for anyone having at least occasional interaction with non-Vulcans... and when you stop laughing (or screaming, or both) make sure you go through the companion web site to understand bugs in your wetware that sabotage your attempts at being perfectly logical.