The Impact of Changed NHRP Behavior in DMVPN Networks

Two years ago I wrote the another Fermatish post: I described how NHRP behavior changed in DMVPN networks using NAT and claimed that it might be a huge problem, without ever explaining what the problem is.

Fabrice quickly identified the problem, but it seems the description was not explicit enough as I’m still getting queries about that post, so here’s a step-by-step description of what’s going on.

A single DMVPN network has two hubs and two spokes. Spokes are behind NAT boxes (e.g. cable/DSL modems) that prevent IPsec session between the spokes to be established.

When Spoke-A tries to establish communication with Spoke-B, the following events take place:

  1. Spoke A sends NHRP request for Spoke B to one of the hub routers (H1).
  2. At the same time spoke A creates fake NHRP entry for B pointing to H1 (this is the crux of the changes introduced in 15.0M).
  3. H1 forwards the NHRP request to Spoke B.
  4. Spoke B tries to establish IPsec tunnel to Spoke A to send the NHRP reply back to Spoke A.
  5. If the B-to-A IPsec tunnel establishment fails, the NHRP reply never arrives to Spoke A, and Spoke A uses fake NHRP entry pointing to H1 till it expires (3 minutes).

And now for the gotcha: Spoke A continues sending traffic toward H1 until the fake NHRP entry expires, regardless of whether H1 fails in the meantime or not. Only after the fake NHRP entry expires will Spoke A send another NHRP request to the hub router(s) alive at that time (H2). End result: traffic between Spoke A and Spoke B will be interrupted for up to three minutes even though you have redundant hubs in the DMVPN network.

More information

Still using DMVPN? Check out my DMVPN webinars.

6 comments:

  1. The procedure is not exactly the same with phase 3 config, although net result of not being able to establish IPsec session will be same.
    Just figured it's worth a note ;-)
  2. AFAIR, dual-hub-single-cloud implementations are discouraged because of slow convergence issues on H1 failure, regardless of NAT behavior, precisely because of NHRP expiration timers.

    I remember reading Cisco recommending redundant single-hub clouds and letting an IGP take control.
  3. "AFAIR, dual-hub-single-cloud implementations are discouraged" ... and how exactly are we supposed to build large-scale Phase 3 clouds? ;)

    Single-hub multi-cloud recommendation was definitely made (and it makes a lot of sense, I use it wherever possible), but then they had to deviate from it to support larger environments.
  4. The first thought that went through my mind is "Why do the spokes have diodes on them".
    Replies
    1. Because I was too lazy to draw proper NAT/firewall boxes ;)
  5. Direction of Anoed and Cathode
Add comment
Sidebar