Router reload after 15 minutes of failed pings

Thursday, May 19, 2011 06:21 +0200

Router reload after 15 minutes of failed pings

Jeroen sent me an interesting challenge: he would like to reload the router when the 3G WAN interface gets stuck (I thought my Nokia phone is the only one exhibiting this problem, but obviously I was wrong). The reload-on-failed-ping EEM applet I’ve published would be a perfect solution, but it uses track delay and the maximum delay timeout is three minutes, while Jeroen would like to wait 15 minutes before reloading the router.

I had two off-the-cuff ideas: execute reload in X command when SLA fails and reload cancel when SLA recovers, or use a second EEM applet with event timer watchdog that is triggered (and stopped) by the SLA-tracking applets. Both options are pretty messy so I was not really happy with either one ... and then Jeroen managed to find a third, totally unexpected solution.

He decided to use the SNMP value event detector to detect SLA failure (each SLA measurement has its own MIB variables) and combined it with a trigger saying “execute this applet if the OID value is below the threshold X times in X sampling intervals.” Here’s his SLA definition (he gets extra bonus points for starting SLA measurements 30 minutes after power up) ...

ip sla 10
 icmp-echo 10.255.251.64 source-interface Loopback0
 request-data-size 16384
 frequency 10
ip sla schedule 10 life forever start-time after 00:30:00

... and the EEM applet (the last number in the OID string has to match ip sla entry number and the polling frequency should match the ip sla frequency):

event manager applet vodafone_down_RELOAD 
 event snmp oid 1.3.6.1.4.1.9.9.42.1.2.9.1.6.10 »
  get-type exact entry-op lt entry-val "2" poll-interval 10
 trigger occurs 179 period 1790
 action 01.0 syslog msg "No ping response last 30 min."
 action 02.0 syslog msg "Reloading now to see if things get better..."
 action 03.0 reload

16 comments:

Mkhan9 19 May 2011 07:36

Awesome!!!!!!!!!

Jónatan Natti 19 May 2011 12:26

Just a thought. But in my experience it's usually enough to do a shut/no shut on the cellular interface to get the 3G back up and running.
I've got this same request a while ago, to reload the router if 3G has been down for a few minutes.
(This was based on the customers experience with other 3G solutions, so it seems common that 3G users have to reload their equipment...)
But this ended with using EEM/TCL and doing shut/no shut on the cellular interface before reloading the router. (different timers). So if shut/no shut fixed the problem, SLA recovered and the router didn't have to reload. (And we preserve the logging buffers, and the recovery is quicker, etc.)

There's also another issue regarding 3G.
Most 3G equipment can fallback to GPRS/EDGE if the if the 3G signal is to weak or unavailable, and this can happen automatically.
However, from what I've heard*, the 3G equipment will not try to go back to 3G even if the 3G signal is available, if there is any data flowing. It will wait until there's no data transfer going on before going from GPRS/EDGE back to 3G.
(* I've not verified this myself, but I heard this from someone who's more familiar with 3G equipment than I am.)

Marshall 19 May 2011 18:44

You can also just reboot the cellular modem using "test cellular 0 2 modem-power-cycle".

DavidB 19 May 2011 22:26

A provider we hired to configure our 3G dmvpn oob routers had this problem aswell, he got in contact with TAC and they provided him after some faultsearching with a working IOS. Dont know about public release though...

Joe C 05 June 2011 14:04

Ivan,

I have done similar EEM scripts in my role. But I don't reload the router, I only reload the 3G-HWIC instead and I do it after I miss 8x IP SLA consecutive pings at 1min intervals and default ping timeout of 5s.

I can share my config if you wish, let me know.

Cheers,
Joe.

Ivan Pepelnjak 05 June 2011 18:52

That would be fantastic. Just paste it as a comment or post a link to somewhere.

Thank you! Ivan

andrew 27 June 2011 11:41

is it necessary to have this on your conf:

snmp-server enable traps ipsla

andrew 27 June 2011 11:42

Joe,
please share

Moris 26 October 2011 18:29

i would appriciate any help with this one :

i have an ipsla that pings a host .
if syslog message "%TRACKING-5-STATE: 222 ip sla 333 reachability Up->Down" has happened 2 times in 3 minutes, its putting a null route .

what i would like to know is how can i make it that this Null route would be removed only if its been 30 Minutes since the last syslog message "%TRACKING-5-STATE: 222 ip sla 333 reachability Down->Up" ?

the thing is i need to know i can have a reliable backup link with a mechanism to verify it [the 30minutes safe period].

track 222 ip sla 223 reachability
ip sla 223
icmp-echo x.x.x.x source-ip y.y.y.y
threshold 500
frequency 5
ip sla schedule 223 life forever start-time now
ip sla reaction-configuration 223 react timeout threshold-type xOfy 2 5 action-type trapOnly
!
event manager applet IPSEC_TUNNEL_2_FAIL
event syslog pattern "%TRACKING-5-STATE: 222 ip sla 223 reachability Up->Down"
trigger occurs 2 period 180
action 1.0 cli command "enable"
action 2.0 cli command "config t"
action 3.0 cli command "ip route 192.168.255.5 255.255.255.255 Null0 name NULL_WHEN_IPSLA223_FAIL"
action 3.1 cli command "exit"
action 4.0 syslog msg "IPSEC_VPN_TUNNEL2 TIMEOUT - MOVING TO IPSEC_TUNNEL1"

i was thinking on using watchdog timer but i understand it counts down from the time of a trigger . thats great , but if the sla is flapping and i get two "Down->Up" - i think it would initiate multiple times the specific eem , no ? if yes - then in case of a continouse flapping ill get into trouble ...

Thank you

Chr13 15 August 2014 23:34

Until Joe C shares his complete config with ip sla, I can share you 2 useful eem applets to reload the Cellular module.

! 1) Manual reload, using "event manager run reload.3g.module"
! You can use this if you still have access to the router.

event manager applet reload.3g.module
event none
action 1.0 cli command "enable"
action 1.1 cli command "configure terminal"
action 1.2 cli command "service internal"
action 1.3 cli command "end"
action 2.0 cli command "test cellular 0 modem-power-cycle"
action 3.1 cli command "configure terminal"
action 3.2 cli command "no service internal"
action 3.3 cli command "end"
action 4.0 syslog msg "Cellular0 module has been rebooted. Reason: unknown Cisco bug."

! 2) Automatic reload (based on a syslog. Usually the hwic throws an error when is faulty).
! You can adapt this to a tracked object and execute it.

event manager applet auto.reload.3g.module
event syslog pattern "CISCO800-2-MODEM_REMOVAL_DETECTED: Cellular0 modem is now REMOVED"
action 1.0 cli command "enable"
action 1.1 cli command "configure terminal"
action 1.2 cli command "service internal"
action 1.3 cli command "end"
action 2.0 cli command "test cellular 0 modem-power-cycle"
action 3.1 cli command "configure terminal"
action 3.2 cli command "no service internal"
action 3.3 cli command "end"
action 4.0 syslog msg "Cellular0 module has been rebooted. Reason: unknown Cisco bug."
!

All the best,
CR

Replies

Ivan Pepelnjak 16 August 2014 10:47

Thank you!

Anonymous 28 January 2015 12:07

Can I ask what version of IOS you are running to be able to configure the event manager applet to "trigger" please.

In the command string that starts with event.......my ios 12.4.15......does not have this command......so I can try to use either "event snmp..." or "event syslog..." but neither will give me the option to configure "trigger occurs"....

many thanks.

Replies

Ivan Pepelnjak 28 January 2015 16:59

Those options probably work only in IOS 12.4T or even 15.x.

Anonymous 30 January 2015 09:47

Thanks for the reply...........I'll see if I can upgrade to either of those and see if it works.
I'll let you know how I get on.

regards.
Dave S.

Anonymous 30 January 2015 10:39

No there's definitely something else missing, I can use 12.4.24.T8 or even 15.1.4 and I get exactly the same results...??????

For example:
Router(config)#event manager applet Link_Down_Reload
Router(config-applet)#event snmp oid 1.3.6.1.4.1.9.9.42.1.2.9.1.6.10
get-type exact entry-op lt entry-val "2" poll-interval 10 ?

average-factor Period used for rate based calculations
entry-type Entry comparison type
exit-comb Exit combination operator
exit-event Raise an exit event upon exit
exit-op Exit operator
exit-time Time before event monitoring is reenabled
exit-type Exit comparison type
exit-val Exit comparison value
maxrun Maximum runtime of applet

as you can see there is no option for the next part of the command to set the "trigger" vlaues.

There is another example in the post that uses syslog.........this also sets a trigger value that I d not get an option for.

event manager applet Link_Down_Reload
event syslog pattern "%LINK-3-UPDOWN: Interface ATM0, changed state to down" ?

maxrun Maximum runtime of applet
occurs Number of occurrences before raising event
period Occurrence period
priority Screen messages that have specified priority
severity-critical Critical conditions, immediate attention needed
severity-debugging Debugging messages
severity-fatal System is unusable
severity-major Major conditions
severity-minor Minor conditions
severity-normal Normal event, signifying returning to normal state
severity-notification Basic notification, informational messages
severity-warning Warning conditions

again there is no option to set the trigger values.....????

Any ideas...????

many thanks.

Dave S.

Ivan Pepelnjak 30 January 2015 13:43

Trigger is a separate EEM command available in 12.4(20)T or later.

Add comment

Recent posts in the same categories

EEM

16 comments: