ipSpace.net blog « ipSpace.net blog

End-to-End Connectivity Test

After you’ve successfully implemented the tracking of the primary next-hop router’s availability, you might be tempted to improve the solution to track end-to-end connectivity through ISP A and switch to the backup ISP whenever your central site is not reachable through the primary ISP. In theory, the required configuration change should be minimal – you only have to change the destination IP address in the IP SLA definition:

Pinging a remote host

hostname GW
!
ip sla 100
 icmp-echo 172.18.0.6 source-interface GigabitEthernet0/2
 threshold 500
 timeout 1000
 frequency 3
ip sla schedule 100 life forever start-time now

Unfortunately, there’s a serious problem with this setup when the path between GW and PE_A fails in a way that is not detected by the GW router (for example, there’s a problem in an intermediate layer-2 switch):

IP SLA fails, and the default route to PE_A is removed
GW installs the default route to PE_B
Pings are now sent from an IP address belonging to the ISP-A uplink onto a path going through the ISP-B.
The redirected pings go through a NAT translation.
You’ll get an oscillating default route if the reverse NAT works and the ICMP responses reach the IP SLA measurement code. Otherwise, the IP SLA test will keep failing, and the default route to ISP A will not be installed even when the connectivity with PE A is restored.

To fix this problem, you have to configure a local policy routing (as the ip sla packets originate within the router, they are only affected by the ip local policy) that matches ICMP packets being sent from the GigabitEthernet0/2 interface (based on their IP address; the PingISP_A access list) and forces them to be sent out through the correct interface toward the expected next hop:

Fix the oscillating routing with local PBR policy

ip local policy route-map LocalPolicy
!
ip access-list extended PingISP_A
 permit icmp host 172.16.1.1 host 172.18.0.6
!
route-map LocalPolicy permit 10
 match ip address PingISP_A
 set ip next-hop 172.16.1.2
 set interface GigabitEthernet0/2

The complete configuration of the gateway router is available on GitHub.

Coping with the Central Site Failure

You can extend the concepts presented in this section even further if you want to. For example, if the central site is not reachable through either ISP (it might be down), retaining ISP A as the primary ISP could make more sense. You would thus need to:

Track the central site’s availability through both ISPs.
Configure a reliable static default route for both ISPs (using a higher administrative distance on the backup one).
Add a third (last-resort) default route pointing to ISP A.
Add an even worse default route pointing to ISP B (in case the interface toward ISP A fails).

The relevant parts of the router configuration are included in the following listing (the complete configuration is on GitHub), and its interpretation is left as an exercise for the reader.

GW router tracking central site availability through both ISPs

hostname gw
!
ip dhcp pool LAN
 network 192.168.0.0 255.255.255.0
 default-router 192.168.0.1 
!
track 100 ip sla 100 reachability
 delay down 10 up 20
!
track 101 ip sla 101 reachability
 delay down 10 up 20
!
interface GigabitEthernet0/1
 description gw -> [h1,h2] [stub]
 ip address 192.168.0.1 255.255.255.0
 ip nat inside
!
interface GigabitEthernet0/2
 description gw -> pe_a
 ip address 172.16.1.1 255.255.255.252
 ip nat outside
!
interface GigabitEthernet0/3
 description gw -> pe_b
 ip address 172.17.3.1 255.255.255.252
 ip nat outside
!
ip local policy route-map LocalPolicy
!
ip nat inside source route-map ISP_A interface GigabitEthernet0/2 overload
ip nat inside source route-map ISP_B interface GigabitEthernet0/3 overload
ip route 0.0.0.0 0.0.0.0 GigabitEthernet0/2 172.16.1.2 10 name ISP_A track 100
ip route 0.0.0.0 0.0.0.0 GigabitEthernet0/3 172.17.3.2 11 name ISP_B track 101
ip route 0.0.0.0 0.0.0.0 GigabitEthernet0/2 172.16.1.2 250 name ISP_A_FB
ip route 0.0.0.0 0.0.0.0 GigabitEthernet0/3 172.17.3.2 251 name ISP_B_FB
!
ip access-list extended PingISP_A
 permit icmp host 172.16.1.1 any
ip access-list extended PingISP_B
 permit icmp host 172.17.3.1 any
!
ip sla 100
 icmp-echo 172.18.0.6 source-interface GigabitEthernet0/2
 threshold 500
 timeout 1000
 frequency 3
ip sla schedule 100 life forever start-time now
ip sla 101
 icmp-echo 172.18.0.6 source-interface GigabitEthernet0/3
 threshold 500
 timeout 1000
 frequency 3
ip sla schedule 101 life forever start-time now
!
route-map ISP_A permit 10
 match interface GigabitEthernet0/2
!
route-map ISP_B permit 10
 match interface GigabitEthernet0/3
!
route-map LocalPolicy permit 10
 match ip address PingISP_A
 set ip next-hop 172.16.1.2
!
route-map LocalPolicy permit 20
 match ip address PingISP_B
 set ip next-hop 172.17.3.2

Surviving Multiple Failures

The above solution survives the failure of:

A single uplink (one of the IP SLA measurements fails)
The central site (both IP SLA measurements fail, but we have backup floating static routes)
The physical link between GW and PE-A (the interface on GW goes down, and even the floating static route is removed)

It does not, however, survive the failure of the central site and the path between GW and PE-A. If you want to make your solution even more reliable¹, you could combine IP SLA checks of central site reachability with IP SLA checks of next-hop reachability and use the following hierarchy of default routes:

Route to PE-A based on reachability of the central site
Route to PE-B based on reachability of the central site
Route to PE-A based on next-hop reachability (using IP SLA or BFD)
Route to PE-B based on next-hop reachability (using IP SLA or BFD)
Floating static route to PE-A based on interface state
Floating static route to PE-B as the absolute last resort

Revision History

2025-03-31

Recreated the router configurations and printouts with IOSv release 15.6(1)T.
Added the surviving multiple failures scenario at the end of this section.

Keeping in mind we’re leaving the sane world and slowly entering the Job Security territory ↩︎

add comment

Configuring Intra-Site Routing

The static default route configured on GW-A and GW-B has to be propagated between them to ensure that both routers have the same view of the Internet connectivity. The easiest way to implement this requirement is to redistribute the static default route into a dynamic routing protocol configured between the two routers, as shown in the next listing:

Redistributing the static default route

router ospf 1
 redistribute static metric 10
 default-information originate
!
interface Ethernet0/1
 ip ospf 1 area 0

OSPF will not announce the redistributed default route until you configure default-information originate within the OSPF process.

If no workstations are attached to the LAN between GW-A and GW-B, we’re finished; all routers attached to that LAN will get the default route pointing to the currently active gateway router through a dynamic routing protocol (Figure 4).

Intra-site routing with workstations attached to the same LAN as GW-A and GW-B is a bit more complex. You can usually configure only a single default gateway on the workstations, so you have to provide a dynamic switchover of the default gateway with a first-hop redundancy protocol (FHRP), for example, Virtual Router Redundancy Protocol (VRRP) or Hot Standby Router Protocol (HSRP). The configuration is straightforward since the track object that you can use to adjust the router’s HRSP priority based on the state of the upstream link has already been configured (see the following two listings; the only difference is the default HSRP priority, which is higher on GW A).

HSRP configuration on GW A

interface Ethernet0/1
 ip address 192.168.0.3 255.255.255.0
 standby 1 priority 100
 standby 1 ip 192.168.0.1
 standby 1 preempt
 standby 1 track 17 decrement 20

HSRP configuration on GW B

interface Ethernet0/1
 ip address 192.168.0.4 255.255.255.0
 standby 1 priority 90
 standby 1 ip 192.168.0.1
 standby 1 preempt

add comment

Configuring Internet Routing

The gateway routers’ configuration follows the principles explained in the Small Site Multihoming article. IP addressing and NAT are configured on both gateway routers, as shown in the following listing (only the GW-A configuration is included in most examples; the complete router configurations are on GitHub).

IP addressing, DHCP and NAT configuration on GW-A

ip dhcp pool LAN
   network 192.168.0.0 255.255.255.0
   default-router 192.168.0.1
exit
!
ip dhcp excluded-address 192.168.0.1 192.168.0.10
ip dhcp excluded-address 192.168.0.128 192.168.0.255
!
interface Ethernet0/1
 description gw_a -> [h1,h2,gw_b]
 ip address 192.168.0.1 255.255.255.0
 ip nat enable
!
interface Ethernet0/2
 description gw_a -> pe_a
 ip address 172.16.1.1 255.255.255.252
 ip nat enable
!
ip access-list standard Site
 10 permit 192.168.0.0 0.0.0.255
!
route-map Internet_Exit permit 10
 match ip address Site
!
ip nat inside source route-map Internet_Exit interface Ethernet0/2 overload

Notes:

The DHCP server runs on both gateway routers to increase the overall reliability. Use the ip dhcp excluded-addresses configuration commands to ensure the routers allocate addresses from non-overlapping pools.
The NAT configuration is using the NAT Virtual Interface
The NAT translation must use a route map to match the outgoing interface. Otherwise, GW-A would translate the host-originated packets routed to GW-B.
The gateway router configuration was recreated on a netlab topology using IOS-on-Linux (IOL) as the gateway router and a pair of FRRouting nodes acting as PE-routers.
All nodes in netlab-powered labs use the first LAN interface as a management interface, which is why the first data plane (inside) interface is Ethernet0/1.

To implement reliable static routes on both gateway routers, you have to configure:

An IP SLA object to track end-to-end connectivity to an IP address that is “far enough” (at least within the ISP network’s core; tracking an upstream ISP server is even better).
A track object that monitors the state of the IP SLA object.
A local routing policy ensuring the IP SLA measurements always use the Internet interface (otherwise, a gateway router with a failed upstream link might use the default path the other gateway router provided for its SLA measurements).
A static default route based on the state of the track object.

The relevant parts of GW-A configuration are included in the following listing (the detailed description of the configuration and monitoring commands related to reliable static routing is available in the Small Site Multihoming article).

Reliable static default route using IP SLA on GW-A

ip access-list extended Ping_probe
 10 permit icmp host 172.16.1.1 host 172.29.0.1
!
route-map LocalPolicy permit 10
 match ip address Ping_probe
 set ip next-hop 172.16.1.2
 set interface Ethernet0/2
!
ip local policy route-map LocalPolicy
!
ip sla 15
 icmp-echo 172.29.0.1 source-interface Ethernet0/2
  threshold 100
  timeout 200
  frequency 1
ip sla schedule 15 life forever start-time now
!
track 17 ip sla 15 state
 delay down 5 up 20
!
ip route 0.0.0.0 0.0.0.0 172.16.1.2 name ISP_A track 17

The setup on GW-B is much more straightforward, as we’re using it just as a backup router. It has a floating static default route; if Internet connectivity on GW-A is operational, the default route received through the routing protocol should override the static default route.

Floating static default route on GW-B

ip route 0.0.0.0 0.0.0.0 172.17.3.2 name ISP_B 250

add comment

Basic Small Site Multi-Homing

Connecting a small site to multiple service providers can be extremely easy – you get two upstream links and two provider-assigned (PA) IPv4 addresses (static or dynamically assigned). Since each ISP will give you only a single IPv4 address, you have to use private IPv4 addresses on the LAN side of the router (Figure 1).

Most ISPs are unwilling to run a dynamic routing protocol with small sites, so you must configure static default routing on your router. Since you would almost always prefer one provider over the other, you would create a primary and a backup default route.

With careful configuration, it’s also possible to achieve rudimentary load sharing with two equally good default routes.

The router on the remote site would also have to perform two independent NAT translations, one for packets sent to ISP A (where local addresses get translated to the IP address assigned by ISP A) and another for packets sent to ISP B (Figure 3).

NAT translation in small multi-homed site

One of the significant issues in multi-homed site design is the proper handling of the return traffic. It’s not uncommon to experience performance problems if the outbound and return traffic flow over different links (also known as asymmetrical routing), while IP multicast and stateful packet inspection (part of the Cisco IOS firewall feature set) almost always break under these conditions. Fortunately, asymmetrical routing is never a problem in a dual NAT design (see the above diagram), as the source address of the outbound packet indicates the link that has been used to send it:

add comment