Category: Tags
high availability
ChatGPT explaining application high availability to a high school kid
Before going into the details, it’s worth figuring out what the application (or system) users need as opposed to what they think they need:
- Fifty Shades of High Availability (2020)
- Figure Out What the Customer Really Needs (2017)
- Are Business Needs Just Excuses for Vendor Shenanigans? (2020)
- Redundancy Does Not Result in Resiliency (2017)
- High Availability Planning: Identify the Weakest Link (2016)
- Meaningful Availability (2020)
- Differential Availability (2020)
Not surprisingly, IT vendors sell magic infrastructure solutions as the high-availability panacea based on the assumption that redundant infrastructure cannot fail. Nothing could be further from the truth:
- High Availability Fallacies (2011)
- If Something Can Fail, It Will (2012)
- How Hard Is It to Think about Failures? (2016)
- This Is What Makes Networking So Complex (2013)
- Decide How Badly You Want to Fail (2019)
- Sometimes You Have to Decide How You Want to Fail (2015)
- Some People Don’t Get It: It Will Eventually Fail (2016)
- The Network Is Reliable and Other Stories (2016)
- Circular Dependencies Considered Harmful (2021)
High Availability Concepts, Technologies, and Solutions
You can use a plethora of approaches depending on your availability targets:
- Disaster recovery is the right tool for the job if you’re OK with the system being down for a few hours.
- Automatic restart of application instances combined with disaster recovery is acceptable if you can accept your system to be down ~0.1% of the time (99.9% availability)
- Availability targets higher than 99.9% can only be reached reliably with proper application design supported by well-designed infrastructure.
I wrote over 130 blog posts on these topics. It would be impossible to list all of them on a single page; major high-availability technologies or concepts thus have dedicated pages:
- Disaster recovery and avoidance
- High availability clusters
- Public and private cloud deployments
- Global and local load balancing with IP anycast
One of the prerequisites for highly available services is also redundant networking infrastructure:
- Redundant Data Center Internet Connectivity – Problem Overview (2013)
- Redundant Data Center Internet Connectivity – High-Level Design (2013)
- Coping with Byzantine Routing Failures (2014)
- Site and Host Multihoming (2023)
- High Availability Switching (2024)
Regardless of your approach, the only sustainable way to get highly available services is the correct design of the application stack. For more details, watch the Designing Active-Active and Disaster Recovery Data Centers webinar; I also wrote a few blog posts on the topic:
- Swimlanes, Read-Write Transactions and Session State (2017)
- Solving the Problem in the Right Place (2017)
- Moving Complexity to Application Layer? (2017)
- Optimizing the Time-to-First-Byte (2021)
Notable Outages
Finally, here are a few notable outages. TL&DR: it can happen to the big guys and will eventually happen to you.
Other High Availability Blog Posts
- 2015
- 2014
- 2013
- 2012
video
We published hundreds of public videos covering dozens of technologies on ipSpace.net. Networking technologies covered in free videos include:
Contents |
Artificial Intelligence and Machine Learning
- Introduction to AI/ML Hype (2021)
- Machine Learning 101 (2021)
- Machine Learning Techniques (2022)
- Use Cases for AI/ML in Networking (2022)
- The Long Tail of AI/ML Problems (2022)
- Ugly Challenges of Using AI/ML in Networking (2022)
- Language Models in AI/ML Landscape (2023)
- Language Model Basics (2023)
More in the AI/ML in Networking: The Good, the Bad and the Ugly webinar (with more videos coming soon).
Border Gateway Protocol (BGP)
- Simplify BGP Configurations (2017)
- History of BGP Route Leaks (2023)
- Hacking BGP for Fun and Profit (2023)
- Outages Caused by Bugs in BGP Implementations (2023)
More in the Network Security Fallacies part of the How Networks Really Work webinar and the Internet Routing Security webinar.
Business Aspects of Networking Technologies
- Define the Problem Before Searching for a Solution (2020)
- Know Your Users' Needs (2020)
- Should You Build or Buy a Solution? (2020)
- High-Level Technology Guidelines (2021)
- Lessons Learned: Technology Still Matters (2021)
- Lessons Learned: Fundamentals Haven't Changed (2021)
- Lessons Learned: Complexity Will Kill Your System (2021)
- Some Services Are Not Worth Delivering (2021)
- The Way Forward (2022)
More in the Business Aspects of Networking Technologies webinar.
Cloud Networking
- Cloud Models, Layers and Responsibilities (2019)
- Public Cloud Networking Overview (2020)
- We Still Need Networking in Public Clouds (2021)
- Public Cloud Networking Is Different (2021)
- How Can You Master Public Cloud Networking? (2021)
- Cloud Services Hierarchy (2022)
- Functions-as-a-Service Demo (2022)
- Cloud-Native Environments (2022)
- Cloud Infrastructure-as-Code (2022)
- Migrating into a Cloud (2023)
Cumulus Linux
- What Is Cumulus Linux All About? (2015)
- Cumulus Linux Base Technologies (2015)
- Cumulus Linux Architecture (2015)
- What is Cumulus Linux All About (2020)
- Simplify Device Configurations with Cumulus Linux (2020)
- NetQ and Cumulus Linux Data Models (2020)
Ethernet VPN (EVPN)
- EVPN Multihoming Taxonomy and Overview (2022)
- EVPN Multihoming Deep Dive (2022)
- MLAG with EVPN (2023)
- vPC Fabric Peering with EVPN Multihoming (2023)
- Advantages and Drawbacks of EVPN-based Multihoming (2023)
FRRouting
- FRRouting Overview (2019)
- FRRouting Architecture (2020)
- FRRouting Configuration and Performance Optimizations (2020)
- FRRouting Usability Enhancements (2020)
- FRRouting Deployment Guidelines (2020)
IPv6 Security
- Reconnaissance in IPv6 (2012)
- IPv6 Secure Neighbor Discovery (SEND) (2013)
- IPv6 Source Address Validation Improvement (2013)
- IPv6 uRPF and Neighbor Discovery Throttling (2013)
- IPv6 Address Assignment and Tracking (2013)
- Dual-Stack Security Exposures (2013)
- IPv6 Security Overview (2020)
- IPv6 Trust Model (2022)
- Practical Aspects of IPv6 Security (2022)
- Rogue IPv6 RA Challenges (2022)
- IPv6 RA Guard and Extension Headers (2022)
- Testing IPv6 RA Guard (2022)
- Traffic Filtering in the Age of IPv6 (2022)
- IPv6 Traffic Filtering Details (2022)
More in the IPv6 Security webinar.
Kubernetes
- Why Do We Need Kubernetes? (2021)
- Kubernetes Principles (2021)
- Kubernetes Architecture (2022)
- Kubernetes Networking Model (2022)
- Understanding Kubernetes Pods (2022)
- Typical Kubernetes Inter-Pod Traffic Walk (2022)
- Kubernetes Services Overview (2022)
- Kubernetes Services Types (2022)
- Exposing Kubernetes Services to External Clients (2022)
- Kubernetes SDN Architecture (2023)
- Sample Kubernetes SDN Implementations (2023)
- Kubernetes Container Networking Interface (CNI) (2023)
- Kubernetes Calico Plugin (2023)
More in the Kubernetes Networking Deep Dive webinar (with more videos coming soon).
Leaf-and-Spine Fabrics
- Multi-Stage Clos Fabrics (2013)
- Building a L3-Only Data Center with Cumulus Linux (2016)
- SPB Deep Dive (2017)
- Overlays in Data Center Fabrics (2017)
- Routing on Hosts Deep Dive (2017)
- Challenges of Data Center Fabric Deployments (2017)
- Building Data Center Fabrics with SPB (2017)
- Building a Pure Layer-3 Data Center with Cumulus Linux (2017)
- Data Center Fabric Validation (2017)
- Separate Data from Code (2017)
Networking Fundamentals
- Networking Challenges (2019)
- Introducing Transmission Technologies (2019)
- Beyond Two Nodes (2019)
- The Need for Network Layers (2019)
- Retransmissions and Flow Control in Computer Networks (2019)
- Putting the Networking Layers Together (2019)
- Breaking the End-to-End Principle (2019)
- Fallacies of Distributed Computing (2020)
- The Network Is Not Reliable (2020)
- End-to-End Latency Is Not Zero (2020)
- Bandwidth Is Neither Infinite Nor Cheap (2020)
- Networks Are (Not) Secure (2020)
- Internet Has More than One Administrator (2020)
- Networks Are Not Homogenous (2020)
- Bridging, Routing, Switching (2020)
- Getting a Packet Across a Network (2020)
- Finding Paths Across the Network (2021)
- Path Discovery in Transparent Bridging and Routing (2021)
- Transparent Bridging Fundamentals (2021)
- IP Routing Fundamentals (2021)
- Comparing Routing and Bridging (2021)
- Typical Large-Scale Bridging Use Cases (2021)
- Introduction to Network Addressing (2021)
- Theoretical View of Network Addressing (2021)
- Early Data-Link-Layer Addressing (2021)
- Local Area Network Addressing (2022)
- Network Layer Addressing (2022)
- Comparing TCP/IP and CLNP (2022)
- Combining Data-Link- and Network Layer Addresses (2022)
- Network Address Assignments (2022)
- Network Address Scopes (2022)
- The Basics of Network Address Translation (NAT) (2022)
- Bridging Beyond Spanning Tree (2022)
- Routing Protocols Overview (2022)
- Link State Routing Protocol Basics (2023)
- Link State Routing Protocol Basics (2023)
More in the How Networks Really Work webinar (with more videos coming soon).
Networking Labs
- Could I Use netlab instead of GNS3? (2022)
- What Can Netlab Do? (2022)
- Getting Started with netlab (2023)
- netlab Topology File (2023)
- netlab IP Address Management (IPAM) (2023)
More in the Network Automation Tools webinar (with more videos coming soon).
Software-Defined WAN (SD-WAN)
- What Is SD-WAN? (2018)
- SD-WAN Reference Design (2018)
- Going Beneath the Cisco SD-WAN Surface (2020)
- Cisco SD-WAN Fundamentals and Definitions (2020)
- Cisco SD-WAN Solution Architecture and Components (2020)
- Cisco SD-WAN Routing Goodness (2020)
- Cisco SD-WAN Onboarding Process (2020)
- Cisco SD-WAN Policies and Centralized Magic (2021)
- Cisco SD-WAN Policies Review (2021)
- Cisco SD-WAN Routing Design (2021)
- Cisco SD-WAN Site Design (2021)
- Cisco SD-WAN Policy Design (2021)
- Managed SD-WAN Services (2022)
- Challenges of Managed SD-WAN Services (2022)
- SD-WAN Backend Architecture (2023)
- SD-WAN CPE Architecture (2023)
- Security Aspects of SD-WAN (2023)
More in Software-Defined WAN (SD-WAN) Overview, Cisco SD-WAN and Business Aspects of Networking Technologies webinars (with more videos coming soon).
Switching and ASICs
- Switch Buffer Architectures (2017)
- Big- or Small-Buffer Switches (2018)
- Tools and Knobs to Use when Tweaking TCP Performance (2018)
- ASICs 101 (2020)
- Packet Buffers in Data Center ASICs (2023)
- Chassis Switch Architectures (2023)
- Types of Switching ASICs (2023)
Other Videos or Video-Related Blog Posts
- 2024
- 2023
- 2021
- 2020
- 2019
- 2018
-
- Video: What Problem Are We Solving with SDDC?
- Real-Life Network Automation: How It All Started
- Making Sense of Software-Defined World
- Video: SPB Fabric Use Cases
- Video: Automatic Diagramming with PowerNSX
- Presentation and Video: Real-Life Automation Wins
- Video: Automated Data Center Fabric Deployment Demo
- Video: Create an NSX Logical Switch with PowerNSX
- [Video] Configure Data Center Devices with PowerShell
- Video: What Is PowerNSX?
- 2017
- 2016
- 2015
- 2014
- 2013
SD-WAN
Software-Defined WAN (SD-WAN) is the second “software-defined” marketing attempt (after the original SDN) to dress a conglomerate of old technologies into shiny new clothes. Even Wikipedia article promotes some of the usual software-defined hype, quoting Network World claim that:
SD-WAN simplifies the management and operation of a WAN by decoupling the networking hardware from its control mechanism. This concept is similar to how software-defined networking implements virtualization technology to improve data center management and operation.
Is It Real?
Want to know how real those claims are? Start the journey with this series of myth-busting blog posts:
- Software-Defined WAN:Well-Orchestrated Duct Tape? (2015)
- Routing Protocols and SD-WAN: Apples and Furbies (2015)
- Do Enterprises Need MPLS? (2016)
- Lack of Fast Convergence in SD-WAN Products (2018)
- Lock-In and SD-WAN: a Match Made in Heaven (2019)
- Impact of Controller Failures in Software-Defined Networks (2019)
- Fast Failover in SD-WAN Networks (2020)
Does SD-WAN make sense? Sure:
Need More Details?
I covered the basics of SD-WAN in Choose the Optimal VPN Service and SDN Use Cases webinars.
Pradosh Mohapatra described the basics of SD-WAN and its typical components and architectures:
- What Is SD-WAN?
- SD-WAN Reference Design
- SD-WAN Backend Architecture
- SD-WAN CPE Architecture
- Security Aspects of SD-WAN
Want to know more about Cisco’s SD-WAN solution (formerly known as Viptela)? Enjoy David Peñaloza Seijas’ deep dive into its architecture and implementation details:
- Going Beneath the Cisco SD-WAN Surface
- Cisco SD-WAN Fundamentals and Definitions
- Cisco SD-WAN Solution Architecture and Components
- Cisco SD-WAN Routing Goodness
- Cisco SD-WAN Onboarding Process
- Cisco SD-WAN Policies and Centralized Magic
- Cisco SD-WAN Policies Review
- Cisco SD-WAN Routing Design
- Cisco SD-WAN Site Design
- Cisco SD-WAN Policy Design
Real-Life SD-WAN
SD-WAN sounds great, but does it work as expected? Maybe not:
- SDN, SD-WAN and FCoE on Gartner Networking Hype Cycle (2015)
- SD-WAN Reality Gap (2019)
- Real-Life SD-WAN Experience (2019)
- Worth Reading: SD-WAN Scalability Challenges (2020)
- Feedback from Another SD-WAN Fan (2020)
Is it secure? Some products seem to be nothing more than a bunch of open-source component glued together with clueless Python code:
- Security Aspects of SD-WAN Solutions (2018)
- SD-WAN Security Under the Hood (2019)
- SD-WAN Security: A Product Liability Insurance Law Would Certainly Help (2020)
- Another SD-WAN Security SNAFU: SQL Injections in Cisco SD-WAN Admin Interface (2021)
Some service providers want to use SD-WAN to offer managed services. Not surprisingly, some people1 don’t find that a good idea:
- SD-WAN: A Service Provider Perspective (2020)
- Managed SD-WAN Services (2022)
- Challenges of Managed SD-WAN Services (2022)
Then there are some technical details vendors love to gloss over:
- Does Unequal-Cost Multipathing Make Sense? (2021)
- Topology- and Congestion-Driven Load Balancing (2021)
Does it work within a public cloud? Yeah, sort of… with a few challenges:
Want Even More?
Love marketing-related rants? Here are a few:
- Some Ridiculous SD-WAN Claims (2015)
- What Is Software-Defined Security? (2016)
- This Is Why I’m Not Doing SD-WAN Webinars (2016)
- The Ever-Increasing Complexity (2017)
- SD-WAN Vendor Landscape (2019)
Last, but definitely not least, you might enjoy these (more esoteric) solutions:
- DLSP – QoS-Aware Routing Protocol on Software Gone Wild (2015)
- Changing Cisco IOS BGP Policies Based on IP SLA Measurements (2019)
- Overlay Networking with Ouroboros on Software Gone Wild (2020)
- Scalable Policy Routing (2021)
Blog Posts I Forgot to Categorize
-
Including those working for said service providers or their customers ↩︎