Jeroen sent me an interesting challenge: he would like to reload the router when the 3G WAN interface gets stuck (I thought my Nokia phone is the only one exhibiting this problem, but obviously I was wrong). The reload-on-failed-ping EEM applet I’ve published would be a perfect solution, but it uses track delay and the maximum delay timeout is three minutes, while Jeroen would like to wait 15 minutes before reloading the router.
I had two off-the-cuff ideas: execute reload in X command when SLA fails and reload cancel when SLA recovers, or use a second EEM applet with event timer watchdog that is triggered (and stopped) by the SLA-tracking applets. Both options are pretty messy so I was not really happy with either one ... and then Jeroen managed to find a third, totally unexpected solution.
He decided to use the SNMP value event detector to detect SLA failure (each SLA measurement has its own MIB variables) and combined it with a trigger saying “execute this applet if the OID value is below the threshold X times in X sampling intervals.” Here’s his SLA definition (he gets extra bonus points for starting SLA measurements 30 minutes after power up) ...
ip sla 10
icmp-echo 10.255.251.64 source-interface Loopback0
ip sla schedule 10 life forever start-time after 00:30:00
... and the EEM applet (the last number in the OID string has to match ip sla entry number and the polling frequency should match the ip sla frequency):
event manager applet vodafone_down_RELOAD
event snmp oid 18.104.22.168.22.214.171.124.126.96.36.199.1.6.10 »
get-type exact entry-op lt entry-val "2" poll-interval 10
trigger occurs 179 period 1790
action 01.0 syslog msg "No ping response last 30 min."
action 02.0 syslog msg "Reloading now to see if things get better..."
action 03.0 reload