Is your argument that the technology works as designed and any issues with it are a people problem?
A polite question like that deserves more than 280-character reply, but I tried to do my best:
BGP definitely works even better than designed. Is that good enough? Probably, and we could politely argue about that… but the root cause of most of the problems we see today (and people love to yammer about) is not the protocol or how it was designed but how sloppily it’s used.
Laura somewhat disagreed with my way of handling the issue:
I def agree with that take - how it’s used is the problem. I disagree with the reaction that “hot mess” comment has gotten tho. If a product is causing outages, the customer often doesn’t care a developer pushed bad code, but that it caused them an issue.
… and that’s where I had to disagree. If we accept blatant claims that “X is a hot mess” combined with vague clickbait-style spraying of guilt, we’ll never get anywhere. We have to do better, figure out whether the problem we’re experiencing is caused by (A) technology, (B) particular implementation of said technology, (C) how technology is being used or (D) how incompetent the users are allowed to be.
I wrote about these aspects in my Some Internet Service Providers Should Really Know Better rant (and I’m still amazed at how some fellow networking engineers tried to defend blatant errors of a Tier-1 ISP), and addressed the “let’s blame some random technology” behavior in Stretched VLANs and Failing Firewall Clusters… but being in a Twitter conversation the best I could do was…
It’s time we stop blaming technology for user stupidity. It’s like blaming cars or roads because an incompetent idiot without a driver’s license crashed into your house.
… to which Laura replied:
Good one. Sticking with the car analogy, are we trying to solve accidents by training the drivers more or are we trying to automate cars and roads? The tech might not be at fault but finding ways to minimize user error will probably be the most efficient solution.
… and that’s the point that had me thinking for at least a week. A quick Google search resulted in infografic claiming road fatalities in Europe decreased by 57% in 16 years. While car technology did improve drastically in that period, we had major safety features like seatbelts and airbags way before 2001… but some of the safety features were not mandatory or were not enforced as rigorously as they are today. Also, public opinion made car safety a high-priority item to consider when buying a new car.
How about BGP? We had tons of safety features in BGP for ages (AS-path filter, maximum prefixes…) but even though it took us years to document them in a BCP they are still not used. The Verizon SNAFU could have been stopped by rigorous application of security measures that were built into BGP when I was still teaching BGP courses in Cisco TAC in Brussels (hint: in late 1990s).
Then there’s the totally incomprehensible lack of common sense. Default EBGP Route Propagation Behavior RFC was published in 2017, 23 years after the first BGP-4 RFC and at least 20 years after I kept repeating “as a customer you have to take precautions not to become a transit AS” in my BGP course. We have no idea how many fat-finger SNAFUs could have been stopped if only we had this simple idea implemented decades ago.
Oh, and there’s one last minor detail: road traffic is somewhat regulated and the rules are occasionally enforced. Also, in most countries you are not expected to drive without a driver’s license, and professional driver’s licenses (= major ISPs) have more stringent requirements. Renesys wrote about reckless driving on the Internet in 2009 (almost exactly a decade ago) but of course nothing changed.
Which brings me to the end of my chat with Laura. She concluded with…
Just to make sure I don’t go off on a tangent my thought is that BGP has a lot of bolts that can help avoid many of the outages we see today and many orgs just don’t use them, so I can’t act outraged when someone complains about it with a simplified view on a podcast.
… and while I can relate to that, I still think (continuing the car analogy) we should stop saying that “cars kill pedestrians”. They don’t, it’s the drivers.. and if someone wants to engage in a public blame-and-shame rant, they should get the basic facts right (like CloudFlare did a while ago)