How Important is BGP RPKI?

Corey Quinn mentioned me in a tweet linking to AWS announcement that they are the biggest user of BGP RPKI (by the size of signed address space) worldwide. Good for them – I’m sure it got their marketing excited. It’s also trivial to do once you have the infrastructure in place. Just saying…

On a more serious front: how important is RPKI and what misuses can it stop?

If you’ve never heard of RPKI, the AWS blog post is not too bad, Nick Matthews wrote a “look grandma, this is how it works” version in 280-character installments, and you should definitely spend some time exploring MANRS resources. Here’s a short version for differently-attentive ;))

What is RPKI? RPKI (Resource Public Key Infrastructure) is a framework that provides origin AS validation – when receiving an IP prefix claiming it originates in AS X, you can validate using ROA (Route Origin Authorization) records whether AS X is allowed to originate that prefix. ROA records could specify an exact prefix that the AS can originate, or maximum prefix length in case you want to originate more specific prefixes.

What happens to invalid prefixes? That depends on local policy. ISPs adhering to MANRS best practices should ignore invalid prefixes and prefer signed over unsigned prefixes… but like in real life, you cannot force anyone else on the Internet to stop listening to fake news.

What is RPKI protecting against? RPKI validates the correctness of origin AS, stopping stupid fat-finger mistakes like two-way BGP-OSPF-BGP redistribution, misconfigured BGP optimizers bringing down third-party services due to clueless tier-1 provider, or the spillover effects of third-world countries trying to stop their population from watching unorthodox video. Off-topic: that spillover was caused by another clueless tier-1 provider… you see a pattern here?

What is RPKI not? RPKI cannot be used to validate the path between your network and the origin. The bad guys can still spoof the AS path.

What could the bad guys do? Do I really have to spell it out? I’m pretty sure I’m not spilling any beans, so here it goes:

  • Get a BGP announcement with the RPKI-signed prefix you’re interested in;
  • Transport it halfway across the globe to a place far enough from the origin;
  • Replace the original AS path with a shorter AS path claiming the origin AS is directly connected to your AS.
  • Advertise shorter AS path (with IP prefix correctly signed by the origin AS) to an ignorant third party.
  • Profit.

Can we stop the bad guys doing that? Yes, but not with RPKI. The means to stop most shenanigans have been known for decades. Many of them are documented in BGP Operations and Security RFC. MANRS best practices go way beyond that, so make sure you read them, understand them, and implement them.

So what’s the big deal with RPKI? Keeping fat fingers from bringing down parts of the global Internet is a big win. Trust me, I know all about that. I had fat fingers a long while ago… although the blast radius was only a single country.

Stopping bad guys with a single silver bullet belongs to a vendor slide deck fairy tale, so stop bothering. A single tool will never be enough, so use whatever tools are at your disposal, and RPKI is not a bad tool to use.

Also, keep in mind that large web properties like Microsoft, AWS, or CloudFlare peer with thousands of networks at numerous exchange points, so it’s pretty hard creating an AS path that is shorter than their AS path.

I mentioned MANRS best practices several times. You OUGHT TO read them.

Peerlock is another tool that can be used to detect and stop fat-finger mistakes. Not surprisingly, it’s another brilliant idea by Job Snijders.

Anything else? Write a comment.

Blog posts in this series

7 comments:

  1. > RPKI cannot be used to validate the path between your network and the origin. The bad guys can still spoof the AS path. > Can we stop the bad guys doing that? Yes, but not with RPKI.

    This is true, but not the full story. While you can't stop the bad guys with RPKI, RPKI can often limit the reach of the bad guys.

    Let's say that I am announcing 2001:db8:1200::/40 in Sweden, and have RPKI saying that it can only be announced by AS 65001, and /40 is the longest prefix length allowed for that prefix. A hijacker can then not use more specific announcments (e.g. 2001:db8:1200::/41) to win over my legitimate announcement. Instead they need to announce the exact same prefix I do, but with a shorter AS path. And getting a shorter AS path can actually be difficult.

    To get the announcement out, the hijacker needs to be a customer of some transit provider, and have an AS number of their own, e.g. 65002. The transit provider will not accept an announcment from them with an AS path consisting of only AS 65001; they need to send the AS path "65002 65001", which immediately gives them a disadvantage, since my, legitimate, announcment consists of only AS 65001. If I and the hijacker use the same transit provider, they lose immediately.

    If the hijacker is further away, e.g. located in America, then there is a good chance that they can trick others in America to use their announcment. But over here in Europe, there is reasonable chance that their announcment will lose against mine, since theirs is likely to have passed through more ASes than mine, and they have that one AS disadvantage to start with. And within Sweden, they are even less likely to win.

    So RPKI can help a little bit, making life slightly more difficult for hijackers, but it is certainly not a panacea. You should absolutely not trust it to protect you against hostile hijacks.

  2. Ivan, besides ROA--address origin validation, S-BGP also specifies Path Validation/Route Attestation. This one can deal with AS path spoofing, but comes with considerable overhead, both for the Update msg content, and the CPU utilization, so while path validation can be done, looks like major vendors like Cisco have opted not to implement it.

    And speaking of performance overhead associated with S-BGP, looks to me like it's often omitted in S-BGP discussions, but it can be a significant issue given the dynamics of the Internet. The Internet always has a high rate of BGP update/churn due to say, hot-potato routing changes, which is exacerbated by the density of connections in the lower-tier part of the Internet these days -- basically the Internet is getting both flatter and denser -- so doesn't processing S-BGP require considerable CPU power with this high-level of churn? Also, with the flatter and denser Internet topology, path hunting activity also tends to increase, again adding more updates to be processed.

    It would be very good to understand the performance implication as more and more prefixes make use of S-BGP.

    Also, S-BGP is more effective when there's widespread deployment. Looking at the current chart on the AWS blog post, we have 1.5m prefixes on the map as of now, which is good but still very small. So there's still a long way to go when S-BGP will become truly effective at preventing prefix hijacking, among other things.

    And just like you said, since BGP is such a complex ecosystem, one tool is never enough. As of now, the state of BGP security, for both control and data plane, is still very much incomplete and fragmented. The performance angle also needs to be worked out/cleared up as more and more prefixes & AS are added, if it wants a chance of widespread adoption, or else it can potentially go the way of large-scale QOS or Multicast, or LISP even.

  3. @Minh Ha: Tried to find anything about real-life S-BGP deployment and failed. Do you have anything you could point me to?

    While the computational complexity of S-BGP could potentially reduce the churn (like using bitcoins instead of credit cards would probably reduce the number of e-commerce transactions ;) keep in mind the minor inconvenience of bringing a new BGP session up after a link failure or node restart. Do you really want to wait for minutes or hours to have the full BGP feed properly validated while the customers annoyed with suboptimal performance are screaming at you?

  4. > ignore invalid prefixes and prefer signed over unsigned prefixes…

    This seems quite strange: for a given prefix, you can't be both signed and unsigned and almost no vendor allows you to override prefix specificity to select a route (Linux would be able to do that by chaining two routing tables, one with signed prefixes and one with unsigned ones).

  5. To correct my previous comment: it does not even make sense to prefer a less specific signed prefix over a more specific non signed prefix since this is not a possible configuration: either the more specific prefix is also signed or it is invalid.
  6. @Ivan, none at all :/. I've also tried digging around, but come up dry. Looks to me like since S-BGP offers an incomplete solution, at potentially considerable performance cost, and offers no competitive advantage, economics-wise, to first movers, plus it requires wholesale deployment to the whole Internet to work effectively, which won't happen anytime soon given the decentralized-democracy Internet model, it simply hasn't gained much traction over the decades.

    So S-BGP seems to indeed look like large-scale QoS, Multicast, LISP and IPv6 multihoming problems. I don't want to sound like a permabear in a raging bull market after Uncle Jeff's announcement of AWS adoption of S-BGP, but it appears any effort that requires total-coverage deployment to work, has failed to gain widespread acceptance in the democratic Internet, till date.

    I did find something that looks like a survey of S-BGP ROA coverage here:

    https://ripe69.ripe.net/presentations/103-route-origin-validation.pdf

    It was in Nov 2014 so it was recent enough to still be relevant. You might want to read it Ivan, as Randy Bush is also part of that :)) .

    The same people also wrote an accompanying report/paper covering the same topic in greater detail, which answers at least part of Vincent's question above, from what I can see:

    https://www.semanticscholar.org/paper/Measuring-BGP-Route-Origin-Registration-and-Iamartino-Pelsser/f5e3b51727b164962794b3e3cf523f4dc86cd31d

    According to them, looks like invalid prefixes that get rejected by ROA can be rescued by summary prefixes that cover them. That raises a question of effectiveness for S-BGP re prefix hijacking prevention, even if it gets deployed more widely.

    As for the prefix hijacking/spoofing problem alone, which is one of the things S-BGP proposes to solve with considerable pain and suffering, IMO it's best to leave the complexity at the edge, and build the intelligence into the endpoints, to be able to detect the destination's integrity, instead of trying to turn the network into a kitchen sink, and a dirty one at that. But given the tendency to turn the network into a be-all end-all sort of omnipotent god, I don't see how it's happening in a hurry :p.

  7. @Vincent: while what I wrote is (probably) technically correct (at least the initial idea was to set lower LOCPREF for valid signed prefixes), an unsigned prefix for which a ROA exists would be marked invalid, so it's a purely hypothetical scenario. Thanks for pointing it out!

Add comment
Sidebar