Blog Posts in November 2017
Here’s another idea from the Building Network Automation Solutions online course: Ruben Tripiana decided to implement a latency measurement tool. His playbook takes a list of managed devices from Ansible inventory, generates a set of unique device pairs, measures latency between them, and produces a summary report (see also his description of the project).
A while ago I helped a large enterprise redesign their data center fabric. They did a wonderful job optimizing their infrastructure, so all they really needed were two switches in each location.
Some vendors couldn’t fathom that. One of them proposed to build a “future-proof” (and twice as expensive) leaf-and-spine fabric with two leaves and two spines. On top of that they proposed to use EBGP as the only routing protocol because draft-lapukhov-bgp-routing-large-dc – a clear case of missing the customer needs.
One of my readers was so delighted that something finally happened after I wrote about a NX-OS bug that he sent me a pointer to another one that has been pending for a long while, and is now officially terminated as FAD (Functions-as-Designed… even documented in the Further Problem Description).
Here’s what he wrote (slightly reworded)…
Netfortius made an interesting comment to my Ansible playbook as a bash script blog post:
Ivan - aren't we now moving the "CLI"[-like] approach, upstream (the one we are just trying to depart, via the more structured and robust approach of RESTAPI).
As I explained several times, I don’t know where the we must get rid of CLI ideas are coming from; the CLI is root of all evil mantra is just hype generated by startups selling alternative approaches (the best part: one of them was actually demonstrating their product using CLI).
You’ll need at least free ipSpace.net subscription to watch the video.
Diane Patton (Cumulus Networks) published a short overview of container networking design options, from traditional MLAG to running Quagga on Docker host.
If you want to learn more about individual designs described in that blog post, watch the Leaf-and-Spine Fabric Architectures and Docker Networking webinars, or join one of the data center online courses.
One of my friends wanted to design a nice-and-easy layer-3 leaf-and-spine fabric for a new data center, and got blindsided by a hyperconverged vendor. Here’s what he wrote:
We wanted to have a spine/leaf L3 topology for an NSX deployment but can’t do that because the Nutanix servers require L2 between their nodes so they can be in the same cluster.
I wanted to check his claims, but Nutanix doesn’t publish their documentation (I would consider that a red flag), so I’m assuming he’s right until someone proves otherwise (note: whitepaper is not a proof of anything ;).
Got this feedback on my Ansible for Networking Engineers webinar:
This webinar is very comprehensive compared to any other Ansible webinars available out there. Ivan does great job of mapping and using real life example which is directly related to daily tasks.
The Ansible online course is even better: it includes support, additional hands-on exercises, sample playbooks, case studies, and lab instructions.
However, Ansible is just a tool that shouldn’t be missing from your toolbox. If you need a bigger picture, consider the Building Network Automation Solutions online course (and register ASAP to save $700 with the Enthusiast ticket).
One of my readers sent me a question about his favorite annoyance:
During my long practice, I’ve never seen an Enterprise successfully managing the network device software upgrade/patching cycles. It seems like nothing changed in the last 20 years - despite technical progress, in still takes years (not months) to refresh software in your network.
There are two aspects to this:
I first met Pluribus Networks 2.5 years ago during their Networking Field Day 9 presentation, which turned controversial enough that I was advised not to wear the same sweater during NFD16 to avoid jinxing another presentation (I also admit to be a bit biased in those days based on marketing deja-moo from a Pluribus sales guy I’d been exposed to during a customer engagement).
Pluribus NFD16 presentations were better; here’s what I got from them:
I know that everyone learns in a slightly different way. Let me share the approach that usually works well for me when a tough topic I’m trying to master includes a practical (hands-on) component: running controlled experiments.
Sounds arcane and purely academic? How about a simple example?
A week ago I talked about this same concept in the Building Network Automation Solutions online course. The video is already online and you get immediate access to it (and the rest of the course) when you register for the next live session.
Arista’s OpenFlow implementation doesn’t support TLS encryption. Usually that’s not a big deal, as there aren’t that many customers using OpenFlow anyway, and those that do hopefully do it over a well-protected management network.
However, lack of OpenFlow TLS encryption might become an RFP showstopper… not because the customer would really need it but because the customer is in CYA mode (we don’t know what this feature is or why we’d use it, but it might be handy in a decade, so we must have it now) or because someone wants to eliminate certain vendors based on some obscure missing feature.
We’re slowly wrapping up the autumn 2017 Building Network Automation Solutions online course, so it’s time to schedule the next one. It will start on February 13th and you can already register (and save $700 over regular price as long as there are Enthusiast tickets left).
Do note that you get access to all course content (including the recordings of autumn 2017 sessions) the moment you register for the course. You can also start building your lab and working on hands-on exercises way before the course starts.
Found this Douglas Adams quote in The Signal and the Noise (a must-read book):
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair
I’ll leave to your imagination how this relates to stretched VLANs, ACI, NSX, VSAN, SD-WAN and a few other technologies.
After explaining the challenges of data center fabric deployments, Dinesh Dutt focused on a very important topic I cover in Week#3 of the Building Network Automation Solutions online course: how do you separate data (data model describing data center fabric) from code (Ansible playbooks and device configurations)
It’s always great to see students enrolled in Building Network Automation Solutions online course using ideas from my sample playbooks to implement a wonderful solution that solves a real-life problem.
One of my readers sent me this question:
I'm having an internal debate whether to use firewall-based VPNs or DMVPN to connect several sites if our MPLS connection goes down. How would you handle it? Do you have specific courses answering this question?
As always, the correct answer is it depends, in this case on:
Third vendor in this year’s series of data center switching updates: Cisco.
As expected, Cisco launched a number of new switches in 2017, and EOL’d older models … for pretty varying value of old. For example, most of the original Nexus 9300 models are gone.
Everyone knows that Service Providers and Enterprise networks diverged decades ago. More precisely, organizations that offer network connectivity as their core business usually (but not always) behave differently from organizations that use networking to support their core business.
Obviously, there are grey areas: from people claiming to be service providers who can’t get their act together, to departments (or whole organizations) who run enterprise networks that look a lot like traditional service provider networks because they’re effectively an internal service provider.
One of my readers coming from system development area asked a fundamental question about the role of automation in enterprise IT (somewhat paraphrased):
[In system development] we automate typical tasks from the pre-defined task repository, so I would like to understand broader context as the automation (I guess) is just a part of the change we want to do in the system. Someone needs to decide what to do, someone needs to accept the change and finally the automation is used.
Of course he’s absolutely right.