The Impact of Changed NHRP Behavior in DMVPN Networks
Two years ago I wrote the another Fermatish post: I described how NHRP behavior changed in DMVPN networks using NAT and claimed that it might be a huge problem, without ever explaining what the problem is.
Fabrice quickly identified the problem, but it seems the description was not explicit enough as I’m still getting queries about that post, so here’s a step-by-step description of what’s going on.
A single DMVPN network has two hubs and two spokes. Spokes are behind NAT boxes (e.g. cable/DSL modems) that prevent IPsec session between the spokes to be established.
When Spoke-A tries to establish communication with Spoke-B, the following events take place:
- Spoke A sends NHRP request for Spoke B to one of the hub routers (H1).
- At the same time spoke A creates fake NHRP entry for B pointing to H1 (this is the crux of the changes introduced in 15.0M).
- H1 forwards the NHRP request to Spoke B.
- Spoke B tries to establish IPsec tunnel to Spoke A to send the NHRP reply back to Spoke A.
- If the B-to-A IPsec tunnel establishment fails, the NHRP reply never arrives to Spoke A, and Spoke A uses fake NHRP entry pointing to H1 till it expires (3 minutes).
And now for the gotcha: Spoke A continues sending traffic toward H1 until the fake NHRP entry expires, regardless of whether H1 fails in the meantime or not. Only after the fake NHRP entry expires will Spoke A send another NHRP request to the hub router(s) alive at that time (H2). End result: traffic between Spoke A and Spoke B will be interrupted for up to three minutes even though you have redundant hubs in the DMVPN network.
More information
Still using DMVPN? Check out my DMVPN webinars.
Just figured it's worth a note ;-)
I remember reading Cisco recommending redundant single-hub clouds and letting an IGP take control.
Single-hub multi-cloud recommendation was definitely made (and it makes a lot of sense, I use it wherever possible), but then they had to deviate from it to support larger environments.