ERM « ipSpace.net blog

Monday, June 2, 2008 07:29 +0200

Generate SNMP trap on high CPU load

Gernot Nusshall has asked an interesting question:

How could I configure the EEM to send an SNMP trap when the cpu load (interval=30sec) is higher than 30%?

My first solution was to enable resource policy traps with the snmp-server enable traps resource-policy, but this feature was introduced in 12.4(15)T and I am not sure everyone is willing to run the latest-and-greatest IOS code. Furthermore, it looks like the traps are sent only for resource policies defined through the ERM MIB; I was not able to generate a trap from a manually configured resource policy. Obviously it was time for another EEM applet.

Use UDP flood to increase router's CPU load

If you want to test the ERM policies in a controlled environment, it's almost mandatory to have tools that allow you to overload the router. One way to overload a router is to flood it with UDP packets. Flooding a router's IP address, you're guaranteed to raise the CPU to 100%, with majority of the process CPU being used by the IP Input process (the interrupt CPU load will also be significant).

This phenomenon illustrates very clearly why it's so important to have inbound access lists protecting the router's own IP addresses on all edge interfaces.

Detect routers operating in process-switching mode

Sometimes the CPU utilization on a router would raise unexpectedly due to incoming packets being process switched. A very common scenario is a GRE tail-end router that has to reassemble IP fragments (usually generated due to incorrect MTU size on the GRE head-end or due to IPSec+GRE combination) or a router under Denial-of-Service attack. To detect these conditions, you can define Embedded Resource Manager (ERM) policy that raises an alert when the CPU utilization of the IP Input process exceeds predefined limits.

resource policy
  policy HighProcCPU type iosprocess
   system
    cpu process
     critical rising 40 falling 25
     major rising 20 falling 10
    !
   !
  !

  user group IPInput type iosprocess
   instance "IP Input"
   policy HighProcCPU

And here are some more ERM usage guidelines:

This time, we're monitoring a group of processes, so the policy definition is no longer global but has a type (iosprocess is the only type defined at the moment).
As in the previous ERM example, we're monitoring CPU utilization of the main CPU (system), but this time we're interested in the process utilization.
The policy is applied to a user group of resources of the type iosprocess (translated into English: a group of IOS processes).
The only process in this group is the IP Input process (and the "magic keyword" is an instance of the group).

The quotes in the instance configuration command are required, as the command accepts only a single word as the process name.

see 2 comments

ERM

Monday, March 17, 2008 07:15 +0100

Use EEM to respond to ERM events

In a previous post, I've described how you can detect high CPU load with the Embedded Resource Manager (ERM). If you want to respond to these events, you could use the syslog event detector within EEM, but it's more reliable to use the new event resource detector available in EEM version 2.2 (introduced in IOS release 12.4(2)T). The resource detector is best used in Tcl policy; if you use it in EEM applet, the same applet is triggered every time a resource policy threshold (minor/major/critical, rising or falling) is crossed. Within the EEM applet it's almost impossible to detect which threshold was crossed.

However, even EEM applet could solve some immediate problems. For example, if you want to store a snapshot of processes on a TFTP server every time the global CPU load crosses a policy threshold, you could use the following applet:

event manager applet ReportHighCPU
event resource policy "HighGlobalCPU"
action 1.0 cli command "show process cpu sorted 5sec | redirect tftp://10.0.0.10/highCPU$_resource_time_sent.txt"

To differentiate the snapshots, I've appended the _resource_time_sent variable set by the EEM before the applet is started to the file name, guaranteeing that the snapshot files will have unique names (at least until the router reload).

As an alternative, you could send the show process output in an e-mail:

event manager environment _ifDown_rcpt [email protected]
!
event manager applet ReportHighCPU
 event resource policy "HighGlobalCPU"
 action 1.0 cli command "show process cpu sorted 5sec"
 action 1.1 info type routername
 action 2.0 mail server "mail-gw" →
    to "$_ifDown_rcpt" from "[email protected]" →
    subject "CPU @ $_resource_current_value" →
    body "$_cli_result"

This article is part of You've asked for it series.

add comment

ERM
EEM

Wednesday, March 12, 2008 07:47 +0100

Detect CPU spikes with Embedded Resource Manager

David Winter wanted to detect high-CPU spikes and act on them. The first part (high CPU utilization) could be done with SNMP, but since IOS release 12.3(14)T, the right tool for the job is the Embedded Resource Manager (ERM).

The ERM syntax is a bit baroque (and not well documented), so let's work through the example: this is the configuration you need to detect high overall CPU utilization on the main CPU in the box:

resource policy
 policy HighGlobalCPU global
  system
   cpu total
    critical rising 95 falling 70 interval 10
    major rising 75 falling 50 interval 10
 !
 user global HighGlobalCPU

And here are the usage/configuration guidelines:

The whole ERM subsystem is configured under the resource policy section;
You always have to configure a policy and a user to which the policy applies. In our example, the user is global (as we're measuring the global CPU load);
The policy we're defining must have the global keyword to indicate we're measuring overall utilization (otherwise you can't attach it to the global user);
We're measuring the load on the main CPU, so we're configuring the system subsection of the policy (on distributed platforms you could specify slot name to measure utilization on a specific linecard);
The cpu section selects CPU load measurements. You could measure interrupt load, process load or total CPU load.
Within each resource section in the policy (in our example, total CPU load on the main system) you can define minor, major and critical thresholds (syslog messages are generated when each threshold is crossed).
After the policy is defined, it's applied to the global user.

With the CPU load measurement policy defined, the router will generate syslog messages (SYS-4-CPURESRISING) every time the overall CPU load exceeds the specified rising thresholds. When the utilization falls below the falling threshold, the SYS-4-CPURESFALLING syslog message is generated.

This article is part of You've asked for it series.

see 8 comments

ERM

Category: ERM