MPLS/VPN-over-GRE-over-IPSec: Does It Really Work?
Short answer: yes, it does.
During the geeky chat we had just after we’d finished recording the Data Center Fabric Packet Pushers podcast, Kurt (@networkjanitor) Bales asked me whether the MPLS/VPN-over-DMVPN scenarios I’m describing in Enterprise MPLS/VPN Deployment webinar really work (they do seem a bit complex).
I always test the router configurations I use in my webinars and I usually share them with the attendees. Enterprise MPLS/VPN Deployment webinar includes a complete sets of router configurations covering 10 scenarios, including five different MPLS/VPN-over-DMVPN designs, so you can easily test them in your lab and verify that they do work. But what about a live deployment?
To be honest, we don’t have a large-scale MPLS/VPN-over-DMVPN live deployment yet (if you do, please share as much as you can in the comments), but Phase 1 DMVPN is not much different from MPLS/VPN-over-(P2P)GRE-over-IPSec ... and we’ve built a 1500+ site network using that solution for one of our customers.
It all started pretty innocently: the customer wanted to reduce costs by replacing their Frame Relay/ATM core with MPLS/VPN WAN services offered by the local service providers. They have to keep different departments using their network strictly separate and MPLS/VPN was the only scalable solution (you don’t want to hear about the design I did for them 15+ years ago).
None of the service providers that they could use was able to provider Carrier’s Carrier services; some of them were severely limited in their routing options (BGP? What BGP? How do you spell that?). As the customer wanted to be totally provider-independent, GRE tunnels were the only option – if you use connected interfaces as tunnel sources, you don’t have to exchange any routing information with the WAN connectivity provider. By building multiple parallel GRE infrastructures (one over each SP network), our customer got total WAN independence and is able to mix-and-match service providers on as-needed basis (they only have to make sure critical sites are connected to at least two providers for redundancy reasons). It’s amazing how that ability helps you in the negotiation process.
Transporting sensitive data across IP infrastructure operated by the service providers was never an option, so we had to add IPsec to the mix, resulting in the stack mentioned in the article title.
Was it easy? Definitely not. Most of the problems were caused by the scale of the project: if you want to run IPsec at gigabit speeds, you need hardware encryption. When we were building the network, Catalyst 6500 was the only reasonable option ... but while it can easily handle MPLS/VPN or GRE or IPsec, it hiccups when you try to do all three things on a single packet. In the end, we had to deploy dual tier architecture similar to this design.
Device configuration was also a challenge: when adding a new site, you have to add bits-and-pieces of configuration to multiple boxes (including the firewalls I haven’t even mentioned yet) and relying on manual configuration process would quickly result in a total mess. Solution: configuration builder, a custom-developed tool that accepts a few parameters describing a new site (or modified parameters of an already deployed site) and generates the configuration snippets that are then downloaded to the network devices.
But unfortunately, the general perception is that the more complex you get, the most expert you are. :( I agree with the expert part but not an Intelligent Expert. 8-)
(A) Using IP connectivity (MPLS/VPN services or otherwise) from multiple SPs and being completely independent from them and their (in)capabilities of supporting customer's routing and convergence requirements;
(B) Encrypting sensitive traffic;
(C) Maintaining strict isolation between departments.
These conditions "completely independent from ISP" and "(in)capabilities ....". Its like someone thinks of them as the greatest expert and just dont believe in anyone else's capabilites. I understand that there are incapabilites dealing with ISP but as I said, we have to work with people as well (to get things resolved) rather than going around people.
For instance, If I think that my organzation's network team is not capable enough to handle STP/L3 Routing issues. Then I can just configure some workaround through Flex Links or EEM to do the job. I am sure somebody can come up with a working solution but the real solution is to get the right people or train the exisitng ones.
Unfortunately, the reality of MPLS/VPN services (as offered by some SPs) is that they simply cannot satisfy the customers' needs. If the only routing option a SP offers is "OSPF or static routes" and you want to use two SPs, you're (almost) stuck.
It's not that the engineers working for that particular SP would be bad. They are usually pretty good engineers and some of them are great people. However, they have to live with the business reality (read: service definition) of their organization and can't help you even when they would know how to.
Anyhow, thanks for the nudges - you gave me food for at least 3 additional blog posts on this topic.
Now there is this:
http://www.cisco.com/en/US/docs/ios/interface/configuration/guide/ir_mplsvpnomgre.html#wp1074480
NHRP and and IGP is no longer needed and the NBMA address is gleaned fro BGP. This combined with GET VPN really gives me hope for MPLSomGREoIPSEC.
I have labbed it up and it seem to work fantastically. The only thing that is hard to deal with is MTU and ensuring that the encryption is always done in the fast path by not fragmenting after gre encapulation or after encryption.
MPLS over mGRE is another great solution (also covered in my webinar 8-) ) which works best when you have only MPLS traffic. If you have to add a few VPNs on top of existing DMVPN network, it's hard to justify re-engineering the whole network. Also, GETVPN is not working on Cat 6500 (at least it did not when I last checked), which many people use as the hub encryption platform.
But we did get it going in the lab.
We would be looking to deploy with asr 1000 hubs and 3900/asr 1000 series spokes.
That said I havent checked the latest XE to see if it has support.
How do you think the best way to tackle mtu? We may be fortunate where we have a core layer behind the spoke pe to reduce the mtu there as we cannot fragment at the same time as label imposition with the above feature.
Are you saying this does not work for you? If so, what's the problem?
This is a very impressive combination of technologies used to create a cool solution, but for example would this be the way to go for a Tier 2 or 3 ISP that didn't have deep enough pockets to run it's own Layer 1 connectivity (DWDM/SDH etc...) and instead chose to get IP services from T1 providers in the form of MPLS Psuedowires?
Thanks,
Any comments/suggestion will be really helpfull, Many thansk in advance.