Router reload after 15 minutes of failed pings
Jeroen sent me an interesting challenge: he would like to reload the router when the 3G WAN interface gets stuck (I thought my Nokia phone is the only one exhibiting this problem, but obviously I was wrong). The reload-on-failed-ping EEM applet I’ve published would be a perfect solution, but it uses track delay and the maximum delay timeout is three minutes, while Jeroen would like to wait 15 minutes before reloading the router.
I had two off-the-cuff ideas: execute reload in X command when SLA fails and reload cancel when SLA recovers, or use a second EEM applet with event timer watchdog that is triggered (and stopped) by the SLA-tracking applets. Both options are pretty messy so I was not really happy with either one ... and then Jeroen managed to find a third, totally unexpected solution.
He decided to use the SNMP value event detector to detect SLA failure (each SLA measurement has its own MIB variables) and combined it with a trigger saying “execute this applet if the OID value is below the threshold X times in X sampling intervals.” Here’s his SLA definition (he gets extra bonus points for starting SLA measurements 30 minutes after power up) ...
ip sla 10
icmp-echo 10.255.251.64 source-interface Loopback0
request-data-size 16384
frequency 10
ip sla schedule 10 life forever start-time after 00:30:00
... and the EEM applet (the last number in the OID string has to match ip sla entry number and the polling frequency should match the ip sla frequency):
event manager applet vodafone_down_RELOAD
event snmp oid 1.3.6.1.4.1.9.9.42.1.2.9.1.6.10 »
get-type exact entry-op lt entry-val "2" poll-interval 10
trigger occurs 179 period 1790
action 01.0 syslog msg "No ping response last 30 min."
action 02.0 syslog msg "Reloading now to see if things get better..."
action 03.0 reload
I've got this same request a while ago, to reload the router if 3G has been down for a few minutes.
(This was based on the customers experience with other 3G solutions, so it seems common that 3G users have to reload their equipment...)
But this ended with using EEM/TCL and doing shut/no shut on the cellular interface before reloading the router. (different timers). So if shut/no shut fixed the problem, SLA recovered and the router didn't have to reload. (And we preserve the logging buffers, and the recovery is quicker, etc.)
There's also another issue regarding 3G.
Most 3G equipment can fallback to GPRS/EDGE if the if the 3G signal is to weak or unavailable, and this can happen automatically.
However, from what I've heard*, the 3G equipment will not try to go back to 3G even if the 3G signal is available, if there is any data flowing. It will wait until there's no data transfer going on before going from GPRS/EDGE back to 3G.
(* I've not verified this myself, but I heard this from someone who's more familiar with 3G equipment than I am.)
I have done similar EEM scripts in my role. But I don't reload the router, I only reload the 3G-HWIC instead and I do it after I miss 8x IP SLA consecutive pings at 1min intervals and default ping timeout of 5s.
I can share my config if you wish, let me know.
Cheers,
Joe.
Thank you! Ivan
snmp-server enable traps ipsla
please share
i have an ipsla that pings a host .
if syslog message "%TRACKING-5-STATE: 222 ip sla 333 reachability Up->Down" has happened 2 times in 3 minutes, its putting a null route .
what i would like to know is how can i make it that this Null route would be removed only if its been 30 Minutes since the last syslog message "%TRACKING-5-STATE: 222 ip sla 333 reachability Down->Up" ?
the thing is i need to know i can have a reliable backup link with a mechanism to verify it [the 30minutes safe period].
track 222 ip sla 223 reachability
ip sla 223
icmp-echo x.x.x.x source-ip y.y.y.y
threshold 500
frequency 5
ip sla schedule 223 life forever start-time now
ip sla reaction-configuration 223 react timeout threshold-type xOfy 2 5 action-type trapOnly
!
event manager applet IPSEC_TUNNEL_2_FAIL
event syslog pattern "%TRACKING-5-STATE: 222 ip sla 223 reachability Up->Down"
trigger occurs 2 period 180
action 1.0 cli command "enable"
action 2.0 cli command "config t"
action 3.0 cli command "ip route 192.168.255.5 255.255.255.255 Null0 name NULL_WHEN_IPSLA223_FAIL"
action 3.1 cli command "exit"
action 4.0 syslog msg "IPSEC_VPN_TUNNEL2 TIMEOUT - MOVING TO IPSEC_TUNNEL1"
i was thinking on using watchdog timer but i understand it counts down from the time of a trigger . thats great , but if the sla is flapping and i get two "Down->Up" - i think it would initiate multiple times the specific eem , no ? if yes - then in case of a continouse flapping ill get into trouble ...
Thank you
! 1) Manual reload, using "event manager run reload.3g.module"
! You can use this if you still have access to the router.
event manager applet reload.3g.module
event none
action 1.0 cli command "enable"
action 1.1 cli command "configure terminal"
action 1.2 cli command "service internal"
action 1.3 cli command "end"
action 2.0 cli command "test cellular 0 modem-power-cycle"
action 3.1 cli command "configure terminal"
action 3.2 cli command "no service internal"
action 3.3 cli command "end"
action 4.0 syslog msg "Cellular0 module has been rebooted. Reason: unknown Cisco bug."
! 2) Automatic reload (based on a syslog. Usually the hwic throws an error when is faulty).
! You can adapt this to a tracked object and execute it.
event manager applet auto.reload.3g.module
event syslog pattern "CISCO800-2-MODEM_REMOVAL_DETECTED: Cellular0 modem is now REMOVED"
action 1.0 cli command "enable"
action 1.1 cli command "configure terminal"
action 1.2 cli command "service internal"
action 1.3 cli command "end"
action 2.0 cli command "test cellular 0 modem-power-cycle"
action 3.1 cli command "configure terminal"
action 3.2 cli command "no service internal"
action 3.3 cli command "end"
action 4.0 syslog msg "Cellular0 module has been rebooted. Reason: unknown Cisco bug."
!
All the best,
CR
In the command string that starts with event.......my ios 12.4.15......does not have this command......so I can try to use either "event snmp..." or "event syslog..." but neither will give me the option to configure "trigger occurs"....
many thanks.
I'll let you know how I get on.
regards.
Dave S.
For example:
Router(config)#event manager applet Link_Down_Reload
Router(config-applet)#event snmp oid 1.3.6.1.4.1.9.9.42.1.2.9.1.6.10
get-type exact entry-op lt entry-val "2" poll-interval 10 ?
average-factor Period used for rate based calculations
entry-type Entry comparison type
exit-comb Exit combination operator
exit-event Raise an exit event upon exit
exit-op Exit operator
exit-time Time before event monitoring is reenabled
exit-type Exit comparison type
exit-val Exit comparison value
maxrun Maximum runtime of applet
as you can see there is no option for the next part of the command to set the "trigger" vlaues.
There is another example in the post that uses syslog.........this also sets a trigger value that I d not get an option for.
event manager applet Link_Down_Reload
event syslog pattern "%LINK-3-UPDOWN: Interface ATM0, changed state to down" ?
maxrun Maximum runtime of applet
occurs Number of occurrences before raising event
period Occurrence period
priority Screen messages that have specified priority
severity-critical Critical conditions, immediate attention needed
severity-debugging Debugging messages
severity-fatal System is unusable
severity-major Major conditions
severity-minor Minor conditions
severity-normal Normal event, signifying returning to normal state
severity-notification Basic notification, informational messages
severity-warning Warning conditions
again there is no option to set the trigger values.....????
Any ideas...????
many thanks.
Dave S.