Category: ERM
Generate SNMP trap on high CPU load
How could I configure the EEM to send an SNMP trap when the cpu load (interval=30sec) is higher than 30%?My first solution was to enable resource policy traps with the snmp-server enable traps resource-policy, but this feature was introduced in 12.4(15)T and I am not sure everyone is willing to run the latest-and-greatest IOS code. Furthermore, it looks like the traps are sent only for resource policies defined through the ERM MIB; I was not able to generate a trap from a manually configured resource policy. Obviously it was time for another EEM applet.
Use UDP flood to increase router's CPU load
If you want to test the ERM policies in a controlled environment, it's almost mandatory to have tools that allow you to overload the router. One way to overload a router is to flood it with UDP packets. Flooding a router's IP address, you're guaranteed to raise the CPU to 100%, with majority of the process CPU being used by the IP Input process (the interrupt CPU load will also be significant).
This phenomenon illustrates very clearly why it's so important to have inbound access lists protecting the router's own IP addresses on all edge interfaces.
Detect routers operating in process-switching mode
resource policyAnd here are some more ERM usage guidelines:
policy HighProcCPU type iosprocess
system
cpu process
critical rising 40 falling 25
major rising 20 falling 10
!
!
!
user group IPInput type iosprocess
instance "IP Input"
policy HighProcCPU
- This time, we're monitoring a group of processes, so the policy definition is no longer global but has a type (iosprocess is the only type defined at the moment).
- As in the previous ERM example, we're monitoring CPU utilization of the main CPU (system), but this time we're interested in the process utilization.
- The policy is applied to a user group of resources of the type iosprocess (translated into English: a group of IOS processes).
- The only process in this group is the IP Input process (and the "magic keyword" is an instance of the group).
The quotes in the instance configuration command are required, as the command accepts only a single word as the process name.
Use EEM to respond to ERM events
However, even EEM applet could solve some immediate problems. For example, if you want to store a snapshot of processes on a TFTP server every time the global CPU load crosses a policy threshold, you could use the following applet:
event manager applet ReportHighCPU
event resource policy "HighGlobalCPU"
action 1.0 cli command "show process cpu sorted 5sec | redirect tftp://10.0.0.10/highCPU$_resource_time_sent.txt"
To differentiate the snapshots, I've appended the _resource_time_sent variable set by the EEM before the applet is started to the file name, guaranteeing that the snapshot files will have unique names (at least until the router reload).
As an alternative, you could send the show process output in an e-mail:event manager environment _ifDown_rcpt [email protected]
!
event manager applet ReportHighCPU
event resource policy "HighGlobalCPU"
action 1.0 cli command "show process cpu sorted 5sec"
action 1.1 info type routername
action 2.0 mail server "mail-gw" →
to "$_ifDown_rcpt" from "[email protected]" →
subject "CPU @ $_resource_current_value" →
body "$_cli_result"
This article is part of You've asked for it series.
Detect CPU spikes with Embedded Resource Manager
The ERM syntax is a bit baroque (and not well documented), so let's work through the example: this is the configuration you need to detect high overall CPU utilization on the main CPU in the box:
resource policy
policy HighGlobalCPU global
system
cpu total
critical rising 95 falling 70 interval 10
major rising 75 falling 50 interval 10
!
user global HighGlobalCPU
And here are the usage/configuration guidelines:
- The whole ERM subsystem is configured under the resource policy section;
- You always have to configure a policy and a user to which the policy applies. In our example, the user is global (as we're measuring the global CPU load);
- The policy we're defining must have the global keyword to indicate we're measuring overall utilization (otherwise you can't attach it to the global user);
- We're measuring the load on the main CPU, so we're configuring the system subsection of the policy (on distributed platforms you could specify slot name to measure utilization on a specific linecard);
- The cpu section selects CPU load measurements. You could measure interrupt load, process load or total CPU load.
- Within each resource section in the policy (in our example, total CPU load on the main system) you can define minor, major and critical thresholds (syslog messages are generated when each threshold is crossed).
- After the policy is defined, it's applied to the global user.
With the CPU load measurement policy defined, the router will generate syslog messages (SYS-4-CPURESRISING) every time the overall CPU load exceeds the specified rising thresholds. When the utilization falls below the falling threshold, the SYS-4-CPURESFALLING syslog message is generated.
This article is part of You've asked for it series.