Router reload after 15 minutes of failed pings

Jeroen sent me an interesting challenge: he would like to reload the router when the 3G WAN interface gets stuck (I thought my Nokia phone is the only one exhibiting this problem, but obviously I was wrong). The reload-on-failed-ping EEM applet I’ve published would be a perfect solution, but it uses track delay and the maximum delay timeout is three minutes, while Jeroen would like to wait 15 minutes before reloading the router.

I had two off-the-cuff ideas: execute reload in X command when SLA fails and reload cancel when SLA recovers, or use a second EEM applet with event timer watchdog that is triggered (and stopped) by the SLA-tracking applets. Both options are pretty messy so I was not really happy with either one ... and then Jeroen managed to find a third, totally unexpected solution.

He decided to use the SNMP value event detector to detect SLA failure (each SLA measurement has its own MIB variables) and combined it with a trigger saying “execute this applet if the OID value is below the threshold X times in X sampling intervals.” Here’s his SLA definition (he gets extra bonus points for starting SLA measurements 30 minutes after power up) ...

ip sla 10
icmp-echo 10.255.251.64 source-interface Loopback0
request-data-size 16384
frequency 10
ip sla schedule 10 life forever start-time after 00:30:00

... and the EEM applet (the last number in the OID string has to match ip sla entry number and the polling frequency should match the ip sla frequency):

event manager applet vodafone_down_RELOAD 
event snmp oid 1.3.6.1.4.1.9.9.42.1.2.9.1.6.10 »
get-type exact entry-op lt entry-val "2" poll-interval 10
trigger occurs 179 period 1790
action 01.0 syslog msg "No ping response last 30 min."
action 02.0 syslog msg "Reloading now to see if things get better..."
action 03.0 reload

16 comments:

  1. Awesome!!!!!!!!!
  2. Just a thought. But in my experience it's usually enough to do a shut/no shut on the cellular interface to get the 3G back up and running.
    I've got this same request a while ago, to reload the router if 3G has been down for a few minutes.
    (This was based on the customers experience with other 3G solutions, so it seems common that 3G users have to reload their equipment...)
    But this ended with using EEM/TCL and doing shut/no shut on the cellular interface before reloading the router. (different timers). So if shut/no shut fixed the problem, SLA recovered and the router didn't have to reload. (And we preserve the logging buffers, and the recovery is quicker, etc.)

    There's also another issue regarding 3G.
    Most 3G equipment can fallback to GPRS/EDGE if the if the 3G signal is to weak or unavailable, and this can happen automatically.
    However, from what I've heard*, the 3G equipment will not try to go back to 3G even if the 3G signal is available, if there is any data flowing. It will wait until there's no data transfer going on before going from GPRS/EDGE back to 3G.
    (* I've not verified this myself, but I heard this from someone who's more familiar with 3G equipment than I am.)
  3. You can also just reboot the cellular modem using "test cellular 0 2 modem-power-cycle".
  4. A provider we hired to configure our 3G dmvpn oob routers had this problem aswell, he got in contact with TAC and they provided him after some faultsearching with a working IOS. Dont know about public release though...
  5. Ivan,

    I have done similar EEM scripts in my role. But I don't reload the router, I only reload the 3G-HWIC instead and I do it after I miss 8x IP SLA consecutive pings at 1min intervals and default ping timeout of 5s.

    I can share my config if you wish, let me know.

    Cheers,
    Joe.
  6. That would be fantastic. Just paste it as a comment or post a link to somewhere.

    Thank you! Ivan
  7. is it necessary to have this on your conf:

    snmp-server enable traps ipsla
  8. Joe,
    please share
  9. i would appriciate any help with this one :

    i have an ipsla that pings a host .
    if syslog message "%TRACKING-5-STATE: 222 ip sla 333 reachability Up->Down" has happened 2 times in 3 minutes, its putting a null route .

    what i would like to know is how can i make it that this Null route would be removed only if its been 30 Minutes since the last syslog message "%TRACKING-5-STATE: 222 ip sla 333 reachability Down->Up" ?

    the thing is i need to know i can have a reliable backup link with a mechanism to verify it [the 30minutes safe period].

    track 222 ip sla 223 reachability
    ip sla 223
    icmp-echo x.x.x.x source-ip y.y.y.y
    threshold 500
    frequency 5
    ip sla schedule 223 life forever start-time now
    ip sla reaction-configuration 223 react timeout threshold-type xOfy 2 5 action-type trapOnly
    !
    event manager applet IPSEC_TUNNEL_2_FAIL
    event syslog pattern "%TRACKING-5-STATE: 222 ip sla 223 reachability Up->Down"
    trigger occurs 2 period 180
    action 1.0 cli command "enable"
    action 2.0 cli command "config t"
    action 3.0 cli command "ip route 192.168.255.5 255.255.255.255 Null0 name NULL_WHEN_IPSLA223_FAIL"
    action 3.1 cli command "exit"
    action 4.0 syslog msg "IPSEC_VPN_TUNNEL2 TIMEOUT - MOVING TO IPSEC_TUNNEL1"

    i was thinking on using watchdog timer but i understand it counts down from the time of a trigger . thats great , but if the sla is flapping and i get two "Down->Up" - i think it would initiate multiple times the specific eem , no ? if yes - then in case of a continouse flapping ill get into trouble ...

    Thank you
  10. Until Joe C shares his complete config with ip sla, I can share you 2 useful eem applets to reload the Cellular module.

    ! 1) Manual reload, using "event manager run reload.3g.module"
    ! You can use this if you still have access to the router.

    event manager applet reload.3g.module
    event none
    action 1.0 cli command "enable"
    action 1.1 cli command "configure terminal"
    action 1.2 cli command "service internal"
    action 1.3 cli command "end"
    action 2.0 cli command "test cellular 0 modem-power-cycle"
    action 3.1 cli command "configure terminal"
    action 3.2 cli command "no service internal"
    action 3.3 cli command "end"
    action 4.0 syslog msg "Cellular0 module has been rebooted. Reason: unknown Cisco bug."


    ! 2) Automatic reload (based on a syslog. Usually the hwic throws an error when is faulty).
    ! You can adapt this to a tracked object and execute it.

    event manager applet auto.reload.3g.module
    event syslog pattern "CISCO800-2-MODEM_REMOVAL_DETECTED: Cellular0 modem is now REMOVED"
    action 1.0 cli command "enable"
    action 1.1 cli command "configure terminal"
    action 1.2 cli command "service internal"
    action 1.3 cli command "end"
    action 2.0 cli command "test cellular 0 modem-power-cycle"
    action 3.1 cli command "configure terminal"
    action 3.2 cli command "no service internal"
    action 3.3 cli command "end"
    action 4.0 syslog msg "Cellular0 module has been rebooted. Reason: unknown Cisco bug."
    !

    All the best,
    CR
  11. Can I ask what version of IOS you are running to be able to configure the event manager applet to "trigger" please.

    In the command string that starts with event.......my ios 12.4.15......does not have this command......so I can try to use either "event snmp..." or "event syslog..." but neither will give me the option to configure "trigger occurs"....


    many thanks.

    Replies
    1. Those options probably work only in IOS 12.4T or even 15.x.
    2. Thanks for the reply...........I'll see if I can upgrade to either of those and see if it works.
      I'll let you know how I get on.

      regards.
      Dave S.
    3. No there's definitely something else missing, I can use 12.4.24.T8 or even 15.1.4 and I get exactly the same results...??????

      For example:
      Router(config)#event manager applet Link_Down_Reload
      Router(config-applet)#event snmp oid 1.3.6.1.4.1.9.9.42.1.2.9.1.6.10
      get-type exact entry-op lt entry-val "2" poll-interval 10 ?

      average-factor Period used for rate based calculations
      entry-type Entry comparison type
      exit-comb Exit combination operator
      exit-event Raise an exit event upon exit
      exit-op Exit operator
      exit-time Time before event monitoring is reenabled
      exit-type Exit comparison type
      exit-val Exit comparison value
      maxrun Maximum runtime of applet


      as you can see there is no option for the next part of the command to set the "trigger" vlaues.

      There is another example in the post that uses syslog.........this also sets a trigger value that I d not get an option for.

      event manager applet Link_Down_Reload
      event syslog pattern "%LINK-3-UPDOWN: Interface ATM0, changed state to down" ?

      maxrun Maximum runtime of applet
      occurs Number of occurrences before raising event
      period Occurrence period
      priority Screen messages that have specified priority
      severity-critical Critical conditions, immediate attention needed
      severity-debugging Debugging messages
      severity-fatal System is unusable
      severity-major Major conditions
      severity-minor Minor conditions
      severity-normal Normal event, signifying returning to normal state
      severity-notification Basic notification, informational messages
      severity-warning Warning conditions


      again there is no option to set the trigger values.....????


      Any ideas...????

      many thanks.

      Dave S.
    4. Trigger is a separate EEM command available in 12.4(20)T or later.
Add comment
Sidebar