Blog Posts in March 2018
Years ago Petr Lapukhov decided that it’s a waste of time to try to make OSPF or IS-IS work in large-scale data center leaf-and-spine fabrics and figured out how to use BGP as a better IGP.
In the meantime, old-time routing gurus started designing routing protocols targeting a specific environment: highly meshed leaf-and-spine fabrics. First in the list: Routing in Fat Trees (RIFT).
One of my readers found this Culumus Networks article that explains why you can’t have more than a few hundred VXLAN-based VLAN segments on every port of 48-port Trident-2 data center switch.
Expect to see similar limitations in most other chipsets. There’s a huge gap between millions of segments enabled by 24-bit VXLAN Network Identifier and reality of switching silicon. Most switching hardware is also limited to 4K VLANs.
Andy sent me this question:
I'm currently playing around with BGP & VXLANs and wondering: is there anything preventing from building a virtual IXP with VXLAN? This would be then a large layer 2 network - but why have nobody build this to now, or why do internet exchanges do not provide this?
There was at least one IXP that was running on top of VXLAN. I wanted to do a podcast about it with people who helped them build it in early 2015 but one of them got a gag order.
The pace of live webinar sessions will slow down a bit in April 2018 due to the onslaught of European spring holiday season. Nonetheless, you’ll be able to enjoy:
- The second part of EVPN Technical Deep Dive series with Dinesh Dutt on April 5th;
- The planning and design part of NSX, ACI or EVPN webinar with Mitja Robas on April 24th;
Sitting in a taxi driving to CLEUR 2018 in Barcelona we couldn’t resist but complain about the stuff we’re seeing in real-life networks, resulting in someone exclaiming something along the lines of “I can’t understand how someone could do so many stupid things”
Welcome to the wonderful world of Dunning-Kruger Effect.
The networking engineers attending the Building Network Automation Solutions online course created numerous amazing automation solutions, most of them already deployed in production networks.
I described some of them in my Troopers 2018 Real-Life Automation Wins talk. The presentation is online and the video has been published on YouTube a few days ago. I hope you’ll find it as inspirational as the Troopers attendees did.
Did you create an awesome automation solution? I’d like to hear about it!
This blog post was initially sent to the subscribers of my SDN and Network Automation mailing list. Subscribe here.
Alex was trying to figure out how to use Catalyst 3850 switches and sent me this question:
Is MLAG an alternative to use rather than physically creating a switch stack?
Let’s start with some terminology.
Link Aggregation Group (LAG) is the ability to bond multiple Ethernet links into a single virtual link. LAG (as defined in 802.1ax standard) can be used between a pair of adjacent nodes. While that’s good enough if you need more bandwidth it doesn’t help if you want to increase redundancy of your solution by connecting your edge device to two switches while using all uplinks and avoiding the shortcomings of STP. Sounds a bit like trying to keep the cake while eating it.
When VMware launched the first version of NSX for vSphere more than four years ago, the NSBU team reached out to me and asked me to create a sponsored webinar describing NSX fundamentals, its architecture, and high-level deployment guidelines.
In the meantime we discussed updating the materials, but nothing ever happened. Time to fix that, this time from a vendor-neutral perspective. We’ll start with a day-long event on April 19th 2018 in Zurich, Switzerland.
I asked David Gee to review my streaming telemetry blog posts to make sure I didn’t make too many blunders, and he sent me a nice summary of his view on the topic in return.
The only thing I could do after reading it was to ask him for permission to do a copy-paste. Here it is:
One of the biggest challenges of network automation is getting usable information from network devices… or as asked by a student in my Building Network Automation Solutions online course in the course Slack team:
How do I get specific information from a specific command from a device without an Ansible Network Module? Is Python the only suggested approach?
I described how hard it is to get structured information from network devices in great details in this section of the Ansible for Networking Engineers webinar and online course. Here are a few more thoughts on the topic:
Someone pointed me to this article by dr. Paul Vixie (of the DNS fame). The best part (as I’m not a security person):
The TCO of new technology products and services, including security-related products and services, should be fudge-factored by at least 3X to account for the cost of reduced understanding. That extra 2X is a source of new spending: on training, on auditing, on staff growth and retention, on in-house integration.
In case you didn’t get it: figure out how much you think the magic unicorn-based software-defined solution will cost, then multiply it by three. Of course nobody wants to admit that.
I was focused on network automation this week, starting with a 2-day workshop and continuing with an overview of real-life automation wins. Let’s end the week with another automation story: automated data center fabric deployment demonstrated by Dinesh Dutt during his part of Network Automation Use Cases webinar.
You’ll need at least free ipSpace.net subscription to watch the video.
We managed to get another awesome lineup of speakers for the Spring 2018 Building Next-Generation Data Center online course.
Russ White, one of the authors of CCDE and CCAr programs and highly respected book author will start the course with a topic everyone should always consider when designing new infrastructure: how do you identify tradeoffs and manage complexity, making sure you meet the customer requirements while at the same time having an easy-to-operate infrastructure.
One of my readers sent me a question along these lines after reading the anti-automation blog post:
Your blog post has me worried as we're currently reviewing offers for NGFW solution... I understand the need to keep the lid on the details rather than name and shame, but is it possible to get the details off the record?
I always believed in giving my readers enough information to solve their challenges on their own (you know, the Teach a man to fish idea).
Continuing the Streaming Telemetry saga, let’s focus on presentation formats and transport mechanisms.
I already mentioned three presentation formats: XML (used by NETCONF), JSON (used by RESTCONF) and Protocol Buffers (used by gRPC). Two of them are text-based, the third one (Protocol Buffers) is binary encoding not unlike ASN.1 BER used by SNMP. That can’t be good in a JSON-hyped world, right?
One of the most important aspects of the introductory part of my Building Network Automation Solutions online course is the question should I buy a solution or build my own?
I already described the arguments against buying a reassuringly-expensive single-blob-of-complexity solution from a $vendor, but what about using point tools?
Ever wondered who manages to produce deja-moo like this one and why they’d do it?
We unveiled a vision to create an intuitive system that anticipates actions, stops security threats in their tracks, and continues to evolve and learn. It will help businesses to unlock new opportunities and solve previously unsolvable challenges in an era of increasing connectivity and distributed technology.
As Erik Dietrich explains in his blog post, it’s usually nothing more than a lame attempt to pretend there are some clothes hanging on the emperor.
Just in case you’re interested: we discussed the state of Intent-Based Majesty’s wardrobe in Network Automation Use Cases webinar.
We started with simple questions like “what is an interface” and “how do they get such weird names in some Linux distributions” which quickly turned into a complex discussion about kernel objects and udev, and details of implementing logical interfaces that are associated with ASIC front-panel physical ports.
Some of my readers got annoyed when I mentioned Google’s BeyondCorp and RFC 1925 in the same sentence (to be perfectly clear, I had Rule#11 in mind). I totally understand that sentiment – reading the reactions from industry press it seems to be the best thing that happened to Enterprise IT in decades.
Let me explain in simple terms why I think it’s not such a big deal and definitely not something new, let alone revolutionary.
I cannot understand the usefulness of L2 services. I think that the preference for L2 services has its origin in the enterprise world (pushed by well known $vendors) while ISPs tend to work at Layer 3 (L3) only, even if they are urged to offer L2 services by their customers.
Some (but not all) ISPs are really good at offering IP transport services with fixed endpoints. Some Service Providers are good at offering per-tenant IP routing services required by MPLS/VPN, but unfortunately many of them simply don’t have the skills needed to integrate with enterprise routing environments.
During the Campus Evolution with Cat9K presentation (I hope I got it right - the whole event was an absolute overload) the presenter mentioned the benefits of brand-new model-driven telemetry, which immediately caused me to put my academic hat on and state that we had model-driven telemetry for at least 30 years.
Don’t believe me? Have you ever looked at an SNMP MIB description? Did it look like random prose to you or did it seem to have some internal structure?
In the Business Impact of Network Automation podcast Ethan Banks asked an interesting question: “what will happen with older networking engineers who are not willing to embrace automation”
The response somewhat surprised me: Alejandro Salisas said something along the lines “they’ll be just fine” (for a while).
Let me recap his argument and add a few twists of my own:
There are companies who consider the network an asset, and companies that consider the network a necessary evil.
On a tangential topic: Russ will talk about network complexity in the Building Next-Generation Data Center online course starting on April 25th.
You’ll need at least free ipSpace.net subscription to watch the video.
Want to know more about VMware NSX? We’ll run an NSX-focused event and a NSX Deep Dive workshop in Zurich on April 19th 2018, an overview webinar comparing NSX, ACI and EVPN on March 1st, and a deep dive in VMware NSX architecture later in 2018.
One of my readers sent me this question:
I'm in the process of researching SD-WAN solutions and have hit upon what I believe is a consistent deficiency across most of the current SD-WAN/SDx offerings. The standard "best practice" seems to be 60/180 BGP timers between the SD-WAN hub and the network core or WAN edge.
Needless to say, he wasn’t able to find BFD in these products either.
Does that matter? My reader thinks it does: