Blog Posts in October 2016
We did a podcast describing NAPALM, an open-source multi-vendor abstraction library, a while ago, and as the project made significant progress in the meantime, it was time for a short update.
NAPALM started as a library that abstracted the intricacies of network device configuration management. Initially it supported configuration replace and merge; in the meantime, they added support for diffs and rollbacks
One of my readers left this comment (slightly rephrased) on my Network Automation RFP Requirements blog post:
Given that we look up to our *nix pioneers as standard bearers for system automation, why do we demand an API from network devices? The API requirement would make sense if the vendor OS is a closed system. If an open system vendor creates APIs for applications running on their system (say for BGP configs) - kudos to them, but I no longer think that should be mandated.
He’s right - API is not a mandatory prerequisite for reliable network automation.
Continuing the Do Enterprises Need VRFs discussion, let’s see which enterprise networks might need MPLS.
Do you need VRFs?
Read the previous blog post. If the answer is NO, you can stop reading. Otherwise, carry on.
Do you need large buffers in data center switches or not? If you’re a vendor your take obviously depends on whether you have them or not, and then there are people saying “it’s bullshit” (mostly agree) and “look, I have a shinier toy” (get lost).
Unfortunately, it’s really hard to get someone who would know what he’s talking about, and be relatively unbiased.
Here’s some mandatory reading in case you still believe redundant networking infrastructure cannot fail:
- The network is reliable – a fantastic collection of real-life failures, including all sorts of split-brain scenarios caused by hare-brained schemes to stretch a cluster just a bit too far;
- More stuff on impacts of network partitions from the same author;
- Notes on Distributed Systems for Young Bloods. A must-read for anyone who thinks that ignoring 40 years of hard-learned lessons and controlling a distributed system from a central controller makes perfect sense. Not that it would ever help.
Robert Graham wrote a great blog post explaining why so many IT certifications suck.
TL&DR: because they are trivial pursuits instead of knowledge assessment tests… but do read the whole post and compare it to your recent certification experience.
After explaining the basics of Linux containers, Dinesh Dutt moved on to the basics of Docker networking, starting with an in-depth explanation of how a container communicates with other containers on the same host, with containers residing on other hosts, and the outside world.
Using SSL over the Internet is a must when dealing with sensitive data. What about SSL between data center components (frontend load-balancers and backend web servers for example)? Does it make sense to you? Can the question be summarized as "do I trust my Datacenter network team"? Or is there more at stake?
In the ideal world in which you’d have a totally reliable transport infrastructure the answer would be “There’s no need for SSL across that infrastructure”.
The only answer I could give is “it depends” (it’s like asking “Do animals need wings?”), and here’s my attempt at building a decision tree:
You can use the decision tree to figure out whether you need VRFs in your data center or in your enterprise WAN.
Do you believe in vendor-supplied black box (regardless of whether you call it ACI or SDDC) or in building your own data center fabric using solid design principles?
It should be an easy choice if believe a business should control its own destiny instead of being pulled around by vendor marketing (to paraphrase Russ White)
Every time I feel like I'm "out of touch" with the hip new thing, I take a weekend to look into it. I tend to discover that the core principles are the same [...]; or you can tell they didn't learn from the previous solution and this new one misses the mark, but it'll be three years before anyone notices (because those with experience probably aren't touching it yet, and those without experience will discover the shortcomings in time.)
Yep, that explains the whole centralized control plane ruckus ;) Read also a similar musing by Ethan Banks.
We did several podcasts describing how one could get stellar packet forwarding performance on x86 servers reimplementing the whole forwarding stack outside of kernel (Snabb Switch) or bypassing the Linux kernel and moving the packet processing into userspace (PF_Ring).
Now let’s see if it’s possible to improve the Linux kernel forwarding performance. Thomas Graf, one of the authors of Cilium claims it can be done and explained the intricate details in Episode 64 of Software Gone Wild.
After finishing the network automation part of a recent SDN workshop I told the attendees “Vote with your wallet. If your current vendor doesn’t support the network automation functionality you need, move on.”
Not surprisingly, the next question was “And what shall we ask for?” Here’s a short list of ideas, please add yours in comments.
One of my readers sent me this question:
I often see designs involving several more than 2 DCs spread over different locations. I was actually wondering if that makes sense to bring high availability inside the DC while there's redundancy in place between the DCs. For example, is there a good reason to put a cluster of firewalls in a DC, when it is possible to quickly fail over to another available DC, as a redundant cluster increases costs, licenses and complexity.
Rule#1 of good engineering: Know Your Problem ;) In this particular case:
The featured webinar in October 2016 is the Designing Active-Active and Disaster Recovery Data Centers webinar, and the featured videos include the discussion of disaster avoidance challenges and the caveats you might encounter with long-distance vMotion. All ipSpace.net subscribers can view these videos, if you’re not one of them yet start with the trial subscription.
As a trial subscriber you can also use this month's featured webinar discount to purchase the webinar.
In Cisco IOS, when a packet is marked by IOS for ICMP redirect to a better gateway, that packet is being punted to the CPU, right?
It depends on the platform, but it’s going to hurt no matter what.
We got pretty far in our Data Center optimization journey. We virtualized the workload, got rid of legacy technologies, and reduced the number of server uplinks and replaced storage arrays with distributed file system.
Final step on the journey: replace physical firewalls and load balancers with virtual appliances.
@ioshints What’s your take on firewall rule sets & IP addresses vs. hostnames?— Matthias Luft (@uchi_mata) August 16, 2016
All I could say in 160 characters was “it depends”. Here’s a longer answer.
If teaching coders isn’t going to solve the problem, then what do we do? We need to go to where the money is. Applications aren’t bought by coders, just like networks aren’t.
One of the attendees of my Building Next-Generation Data Center course asked this interesting question after listening to my description of differences between Chet/Puppet and Ansible:
For Zero-Touch Provisioning to work, an agent gets installed on the box as a boot up process that would contact the master indicating the box is up and install necessary configuration. How does this work with agent-less approach such as Ansible?
Here’s the first glitch: many network devices don’t ship with Puppet or Chef agent; you have to install it during the provisioning process.
One of my readers sent me interesting feedback after reading my explanation of why I’d try not to use OSPF as a routing protocol between hosts and ToR switches. He said:
Unfortunately we can’t use BGP because IBM mainframes support only OSPF or RIP, so we decided to use VRFs instead.
Here’s what they did:
Marco Canini from UC Louvain is working on an IXP research project focused on bringing privacy guarantees into Internet routing context. They’re trying to understand the privacy considerations of network operators and have created a short survey to gather the initial data.
Researchers from UC Louvain have been involved in tons of really useful projects including BGP PIC, LFA, MP-TCP, Fibbing, Software-defined IXP and flow-based load balancing, so if you’re connected to an IXP, please take your time and fill in the survey.