Networking, Engineering and Safety
You might remember my occasional rants about lack of engineering in networking. A long while ago David Barroso nicely summarized the situation in a tweet responding to my BGP and Car Safety blog post:
If we were in a proper engineering we’d be discussing how to regulate and add safeties to an important tech that is unsafe and hard to operate. Instead, we blog about how to do crazy shit to it or how it’s a hot mess. Let’s be honest, if BGP was a car it’d be one pulled by horses.
MUST READ: Lessons from load balancers and multicast
Justin Pietsch published another must-read article, this time dealing with operational complexity of load balancers and IP multicast. Here are just a few choice quotes to get you started:
- A critical lesson I learned is that running out of capacity is the worst thing you can do in networking
- You can prevent a lot of problems if you can deep dive into an architecture and understand it’s tradeoffs and limitations
- Magic infrastructure is often extremely hard to troubleshoot and debug
You might find what he learned useful the next time you’re facing a unicorn-colored slide deck from your favorite software-defined or intent-based vendor ;))
Video: Define the Problem Before Searching for a Solution
In December 2019 I finally turned my focus on business challenges first presentation into a short webinar session (part of Business Aspects of Networking Technologies webinar) starting with defining the problem before searching for a solution including three simple questions:
- What BUSINESS problem are you trying to solve?
- Are there good-enough alternatives, or should you invest in new technology and/or equipment?
- Is the problem worth solving?
Considerations for Host-based Firewalls (Part 1)
This is a guest blog post by Matthias Luft, Principal Platform Security Engineer @ Salesforce, and a regular ipSpace.net guest speaker.
Having spent my career in various roles in IT security, Ivan and I always bounced thoughts on the overlap between networking and security (and, more recently, Cloud/Container) around. One of the hot challenges on that boundary that regularly comes up in network/security discussions is the topic of this blog post: microsegmentation and host-based firewalls (HBFs).
Example: Securing AWS Deployment
Nadeem Lughmani created an excellent solution for the securing your cloud deployment hands-on exercise in our public cloud online course. His Terraform-based solution includes:
- Security groups to restrict access to web server and SSH bastion host;
- An IAM policy and associated user that has read-only access to EC2 and VPC resources (used for monitoring)
- An IAM policy that has full access to as single S3 bucket (used to modify static content hosted on S3)
- An IAM role for AWS CloudWatch logs
- Logging SSH events from the SSH bastion host into CloudWatch logs.
Docker Services 101
Last week I published an overview of how complex (networking-wise) Docker Swarm services can get. This time let’s focus on something that should have been way simpler: running container-based services on a single Linux host.
In the first part of this article I’m focusing on the basics, including exposed ports, and published ports. The behind-the-scenes details are coming in a week or so; in the meantime you can enjoy (most of them) in the Docker Networking Deep Dive webinar.
Redundant Server Connectivity in Layer-3-Only Fabrics (Part 2)
In June 2020 I published the first part of Redundant Server Connectivity in Layer-3-Only Fabrics article describing the target design and application-layer requirements.
During the summer I added the details of multi-subnet server and client connectivity and a few conclusions.
Video: Networks Are Not Homogenous
The last Fallacy of Distributed Computing I addressed in the introductory part of How Networks Really Work webinar was The Network Is Homogenous. No, it’s not and it never was… for more details watch this video.
Review Questions: Switching, Bridging and Routing
One of the most annoying part in every training content development project was the ubiquitous question somewhere at the end of the process: “and now we’d need a few review questions”. I’m positive anyone ever involved in a similar project can feel the pain that question causes…
Writing good review questions requires a particularly devious state of mind, sometimes combined with “I would really like to get the answer to this one” (obviously you’d mark such questions as “needs further research”, and if you’re Donald Knuth the question would be “prove that P != NP”).
Docker Swarm Services behind the Scenes
Remember the claim that networking is becoming obsolete and that everyone else will simply bypass the networking teams (source)?
Good news for you – there are many fast growing overlay solutions that are adopted by apps and security teams and bypass the networking teams altogether.
That sounds awesome in a VC pitch deck. Let’s see how well that concept works out in reality using Docker Swarm as an example (Kubernetes is probably even worse).
Worth Reading: Hardware Packet Capture Failures
Greg Ferro is back with some great technical content, this time explaining why hardware-based packet capture might return unexpected results.
MUST READ: What I've learned about scaling OSPF in Datacenters
Justin Pietsch published a fantastic recap of his experience running OSPF in AWS infrastructure. You MUST read what he wrote, here’s the TL&DR summary:
- Contrary to popular myths, OSPF works well on very large leaf-and-spine networks.
- OSPF nuances are really hard to grasp intuitively, and the only way to know what will happen is to run tests with the same codebase you plan to use in a production environment.
Dinesh Dutt made similar claims on one of our podcasts, and I wrote numerous blog posts on the same topic. Not that anyone would care or listen; it’s so much better to watch vendor slide decks full of the latest unicorn dust… but in the end, it’s usually not the protocol that’s broken, but the network design.
Podcast: Trusting Routing Protocols
The can we trust routing protocols series of blog posts I wrote in April 2020 (part 1, part 2, response from Jeff Tantsura) culminated in an interesting discussion with Russ White and Nick Russo now published as The Hedge Episode 43.
Which Public Cloud Should I Master First?
I got a question along these lines from a friend of mine:
Google recently announced a huge data center build in country to open new GCP regions. Does that mean I should invest into mastering GCP or should I focus on some other public cloud platform?
As always, the right answer is “it depends”, for example:
Worth Reading: NetDevOps Concepts - Minimum Viable Product
Brett Lykins published an excellent description of what an automation Minimum Viable Product could be.
Not surprisingly, he’s almost perfectly in sync with what we’ve been telling networking engineers in ipSpace.net Network Automation online course:
- Start small
- Go for quick wins
- Do read-only stuff before modifying device configurations
- Test, test, test…