Most of the public cloud training seems focused on developers. No surprise there, they are the usual beachhead public cloud services need to get into large organizations. Unfortunately, once the production applications start getting deployed into public cloud infrastructure, someone has to take over operations, and that’s where the fun starts.
For whatever reason, there aren’t that many resources helping the infrastructure operations teams understand how to deal with this weird new world, at least according to the feedback Jawed left on Azure Networking webinar:
Even though you need plenty of traditional networking constructs to deploy a complex application stack in a public cloud (packet filters, firewalls, load balancers, VPN, BGP…), once you start digging deep into the bowels of public cloud virtual networking, you’ll find out it’s significantly different from the traditional Ethernet+IP implementations common in enterprise data centers.
For an overview of the differences watch the Public Cloud Networking Is Different video (part of Introduction to Cloud Computing webinar), for more details start with AWS Networking 101 and Azure Networking 101 blog posts, and continue with corresponding cloud networking webinars.
Boris Lazarov sent me an excellent question:
Does it make sense and are there any inherent problems from design perspective to use the underlay not only for transport of overlay packets, but also for some services. For example: VMWare cluster, vMotion, VXLAN traffic, and some basic infrastructure services that are prerequisite for the rest (DNS).
Before answering it, let’s define some terminology which will inevitably lead us to the it’s tunnels all the way down endstate.
Bill Dagy sent me an annoying ISR gotcha. In his own words:
Since you have a large audience I thought I would throw this out here. Maybe it will help someone avoid spending 80 man hours troubleshooting network slowdowns.
Here’s the root cause of that behavior:
Cisco is now shipping routers that have some specified maximum throughput, but you have to buy a “boost license” to run them unthrottled. Maybe everyone already knew this but it sure took us by surprise.
Don’t believe it? Here’s a snapshot from Cisco 4000 Family Integrated Services Router Data Sheet:
In the Non-Stop Forwarding (NSF) article, I mentioned that the routers adjacent to the device using NSF have to play along to make the idea work. That capability is called Graceful Restart. Today we’ll explore its intricate details, be diplomatic, and leave the shortcomings and tradeoffs for the next blog post.
Imagine an access (provider edge) router providing connectivity services to its clients and running a routing protocol with one or more upstream devices.
I think we have a global problem with code quality. Both from a security perspective, and from a less problematic but still annoying bugs-everywhere perspective. I’m not sure if the issue is largely ignored, or we’ve given up on it (see also: Cloud Complexity Lies or Cisco ACI Complexity).
Here’s another masterpiece by Charity Majors: Why I hate the phrase “breaking down silos”. A teaser in case you can’t decide whether to click the link:
When someone says they are “breaking down silos”, whether in an interview, a panel, or casual conversation, it tells me jack shit about what they actually did.
One of my subscribers has to build a small data center fabric that’s just a tad too big for two switch design.
For my datacenter I would need two 48 ports 10GBASE-T switches and two 48 port 10/25G fibber switches. So I was watching the Small Fabrics and Lower-Speed Interfaces part of Physical Fabric Design to make up my mind. There you talk about the possibility to do a leaf and spine with 4 switches and connect servers to the spine.
A picture is worth a thousand words, so here’s the diagram of what I had in mind:
Last week I published an unrolled version of Peter Paluch’s explanation of flooding differences between OSPF and IS-IS. Here’s the second part of the saga: IS-IS flooding details (yet again, reposted in a more traditional format with Peter’s permission).
In IS-IS, DIS1 is best described as a “baseline benchmark” – a reference point that other routers compare themselves to, but it does not sit in the middle of the flow of updates (Link State PDUs, LSPs).
A quick and simplified refresher on packet types in IS-IS: A LSP carries topological information about its originating router – its System ID, its links to other routers and its attached prefixes. It is similar to an OSPF LSU containing one or more LSAs of different types.
Christoph Jaggi sent me a link to an interesting article describing security vulnerabilities pentesters found in Cisco SD-WAN admin/management code.
I’m positive the bugs have been fixed in the meantime, but what riled me most was the root cause: Little Bobby Tables (aka SQL injection) dropped by. Come on, it’s 2021, SD-WAN is supposed to be about building secure replacements for MPLS/VPN networks, and they couldn’t get someone who could write SQL-injection-safe code (the top web application security risk)?
Someone using my netsim-tools sent me an intriguing question: “Would it be possible to get network topology graphs out of the tool?”
I did something similar a long while ago for a simple network automation project (and numerous networking engineers built really interesting stuff while attending the Building Network Automation Solutions course), so it seemed like a no-brainer. As always, things aren’t as easy as they look.
I loved the Time Dilation blog post by Seth Godin. It explains so much, including why I won’t accept a “quick conf call to touch base and hash out ideas” from someone coming out of the blue sky – why should I be interested if they can’t invest the time to organize their thoughts and pour them into an email.
The concept of “creation-to-consumption” ratio is also interesting. Now I understand why I hate unedited opinionated chinwagging (many podcasts sadly fall into this category) or videos where someone blabbers into a camera while visibly trying to organize their thoughts.
Just FYI, these are some of the typical ratios I had to deal in the past:
Peter Paluch loves blogging in microchunks on Twitter ;) This time, he described the differences between OSPF and IS-IS, and gracefully allowed me to repost the explanation in a more traditional format.
My friends, I happen to have a different opinion. It will take a while to explain it and I will have to seemingly go off on a tangent. Please have patience. As a teaser, though: The 2Way state between DRothers does not improve flooding efficiency – in fact, it worsens it.
In early September, I started yet another project that’s been on the back burner for over a year: ipSpace.net Design Clinic (aka Ask Me Anything Reasonable in a more structured format). Instead of collecting questions and answering them in a podcast (example: Deep Questions podcast), I decided to make it more interactive with a live audience and real-time discussions. I also wanted to keep it valuable to anyone interested in watching the recordings, so we won’t discuss obscure failures of broken designs or dirty tricks that should have remained in CCIE lab exams.
Stateful Switchover (SSO) is another seemingly awesome technology that can help you implement high availability when facing a
broken non-redundant network design. Here’s how it’s supposed to work:
- A network device runs two copies of the control plane (primary and backup);
- Primary control plane continuously synchronizes its state with the backup control plane;
- When the primary control plane crashes, the backup control plane already has all the required state and is ready to take over in moments.
Delighted? You might be disappointed once you start digging into the details.