The road to complex designs is paved with great recipes

A while ago someone asked me to help him troubleshoot his Internet connectivity. He was experiencing totally weird symptoms that turned out to be a mix of MTU problems, asymmetric routing (probably combined with RPF checks on ISP side) and non-routable PE-CE subnets. While trying to figure out what might be wrong from the router configurations, I was surprised by the amount of complexity he’d managed to introduce into his DMZ design by following recipes and best practices we all dole out in blog posts, textbooks and training materials.

His DMZ was a typical redundant DMZ design: two routers connected to two ISPs and running BGP with them, and a redundant pair of firewalls, as illustrated in the following RFC-ready diagram:

PE-ISP-A    PE-ISP-B
| |
CE-A CE-B
| |
=======PUB-SUB=======
| |
FW-A FW-B

EBGP sessions were established between CE-A and PE-ISP-A and between CE-B and PE-ISP-B (perfect). There was an IBGP session between CE-A and CE-B (perfect), but it was running between loopback interfaces.

OSPF was running in the DMZ to propagate loopback interface addresses between CE-routers (otherwise IBGP session would not start) and default route to the firewalls. He was also redistributing PE-CE subnets into OSPF to fix BGP next hop issues.

Both CE-routers had network statements to advertise the public IP subnet (PUB-SUB) to the Internet (perfect) and a static route to null 0 to ensure the PUB-SUB would always be advertised.

I could easily recognize each and every design choice he made; the whole DMZ was a perfect implementation of various BGP recipes that I can trace back at least 15 years (when I put them in the BGP course we developed for Cisco Europe in mid-1990s). Note: I don’t think I could claim to be the author of any one of them; they were always considered (at least by some) best practices.

While most of the recipes made his design more complex than necessary, the last one (static route to null 0) was actually harmful (as the academics say: the proof is left as an exercise for the reader – post it in the comments).

There are numerous changes one could make to simplify this design, for example:

Run IBGP session over directly-connected interface (PUB-SUB). If you have two routers connected with a single link, it makes no sense to run IBGP session between loopback interfaces; loopbacks are useful if you have multiple alternate paths between the IBGP neighbors or if the IBGP neighbors are not directly connected.

Use static default route on the firewalls and HSRP on the CE-routers. This design is almost equivalent to the OSPF-in-the-DMZ design from the firewall perspective; track objects and HSRP priorities can get pretty close to whatever OSPF default route manipulation you can do on the CE-routers.

Use next-hop-self on the IBGP session. When IBGP routers advertise themselves as the BGP next-hop, the redistribution of PE-CE subnets into OSPF is no longer needed.

Remove the static route to null0. The IP subnet the CE routers have to advertise to the Internet is directly connected, so there’s no need to create an artificial IP prefix in the IP routing table to support the BGP network statement.

Last but definitely not least, remove OSPF from DMZ, as all the reasons for using it are gone.

Anything else? Please write a comment! And while speaking of misapplied recipes, two blog posts come to mind: Knowledge or recipes and Knowledge + Experience = Wisdom.

14 comments:

  1. I see some potential issues with the static to Null0. Even if the connected interface goes down on CE-A the network will still be announced to ISP-A. So traffic from the Internet will always be entering ISP-A even if firewalls route towards ISP-B. This will lead to assymetric routing and firewalls don't like that so they would probably drop all the sessions.

    But traffic won't probably even make it that far. If connection to PUB-SUB is down on CE-A then traffic will be blackholed (assuming only one interface to PUB-SUB).

    ReplyDelete
  2. Ivan Pepelnjak15 August, 2011 09:41

    Perfect answer. The route to "null 0" is dangerous because it can introduce a black hole if a CE router loses connection to PUB-SUB.

    ReplyDelete
  3. “Perfection is achieved not when there is nothing left to add, but when there is nothing left to take away” – Antoine de Saint-Exupery

    ReplyDelete
  4. We are running almost the same setup, but I will say...without the "bad" stuff. We are running HSRP between CE routers, iBGP between CE routers (direct connect) and HA firewalls. We aren't running null0 insertion so no problem there.

    One thing I will throw out there (in case it may help anyone) is that we are also running two types of tracking...

    1) If the CE router cannot see the firewalls, shutdown the BGP neighborship to the ISP

    2) If the BGP neighborship fails, decrement HSRP

    ReplyDelete
  5. Ivan Pepelnjak15 August, 2011 16:23

    #2 is nice, #1 is awesome.

    An exercise for the reader (to continue the academic lingo): do #1 in a way that does not require changes to router configuration.

    ReplyDelete
  6. If you have MetroEthernet links, then implement BFD for fast failover, also apply it on the BGP neighbour with the neighbor 10.1.1.1 fall-over bfd to reduce the times.

    To do the HSRP decrementation, I would use a track object with an IP SLA pinging the PE, or... introduce a dummy route on the PE (local, not redistributed to the ISP backbone, aggree in a custom BGP community like 1111:0 so you can find it all the time), and use a tack object to check if the route is in the routing table. With the BFD it is fine, and you can easily assign a track object to the HSRP monitoring.

    If you are really tricky, you can use the EEM to do the LAN BGP monitoring, but HSRP is much faster and easier.

    ReplyDelete
  7. I agree with all points except the HSRP/static route one. USE BGP. That means getting a firewall that supports it. Bye bye, ASA!

    ReplyDelete
  8. Ivan Pepelnjak15 August, 2011 20:27

    HSRP/static might actually be simpler than BGP. If you don't want to send lots of routes to the firewall, you have to tweak the default route metrics in BGP.

    ReplyDelete
  9. "neighbor ... fall-over route-map ..." maybe combined with static routes and some tracking (IP SLA or whatever...), depending on the design...

    BTW: That was an easy one: http://www.nil.com/ipcorner/DesigningBGPNetworks ;)

    ReplyDelete
  10. Our edge network is way over complicated, but it was designed to be very flexible. We have 2 primary data centers, 2 backup sites, and rent space at a local colo. We currently have 4 upstream internet connections, each of them peer to their own edge router, now, at each primary data center we have a pair of ASRs that aggregate those routes. They also form a loop between the 2 sites, and the colo facility (where one of our circuits comes in) where we have a 3750x. That loop runs an IGP to supply reach-ability for each edge router to peer via iBGP to each of the ASRs. The ASRs run a second IGP instance to advertise a default route to our firewalls, which readvertise it into the core.

    As for external routing, we're splitting our netblock up between the two sites, advertising half at each, plus the whole netblock to ensure reachability to our entire address space if one of the sites goes down. The ASRs advertise those netblocks to the edge routers that peer with our upstream ISPs.

    ReplyDelete
  11. Ivan Pepelnjak17 August, 2011 07:12

    Now this is an interesting idea. If you use a static host route for EBGP peer, it would probably work.

    ReplyDelete
  12. I've never configured it and I've just overflown the configuration guide (http://www.cisco.com/en/US/docs/ios/12_4t/ip_route/configuration/guide/brbpeer.html), but from what I read the session is shut down as soon as the route defined in the route-map disappears. So if you have a setup like described in the blog post, the following CE configuration could maybe work:

    ===============================
    ip sla monitor 10
    type echo protocol ipIcmpEcho FW-IP
    timeout 1000
    frequency 10
    ip sla monitor schedule 10 life forever start-time now

    track 10 rtr 10

    router bgp ASN
    neighbor PE fall-over route-map TRACK-FW

    route-map TRACK-FW permit 10
    match 1.1.1.1/32

    ip route 1.1.1.1 255.255.255.255 Null0 track 10
    ===============================

    But as already said - I've never used the feature and even if it would work, I would not be very happy to use a configuration like above...

    ReplyDelete
  13. As an alternative design, we run eBGP through our firewalls by creating an internal AS on the inside of our network. From the inside we distribute the appropriate routes to the edge routers to advertise externally. If for some reason a firewall pair fails for any reason or the router becomes an island, the BGP connection inside will fail and the advertisements will drop from the edge router. It's setup in a square, two edge routers running iBGP, two internal routers running iBGP with a private AS, and two sets of firewalls that we run eBGP through. Then you just need some local-prefs/med to control the traffic flow, and to remove-private and you're good to go.

    It's a little complicated, but if you don't trust your firewalls and have diverse locations, this is an option.

    ReplyDelete
  14. Following on from what Daniel mentioned: "If the CE router can not see the firewalls"

    in the event the CE router loses connectivity to the firewall would we need to really shutdown the neighbour to the ISP if we were to deploy a separate dedicated iBGP link between both CE routers. If CE-A lost connectivity to the firewall the inbound traffic would learn an alternate path via the iBGP link and route to the firewall via CE-B?

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.