Detect CPU spikes with Embedded Resource Manager
David Winter wanted to detect high-CPU spikes and act on them. The first part (high CPU utilization) could be done with SNMP, but since IOS release 12.3(14)T, the right tool for the job is the Embedded Resource Manager (ERM).
The ERM syntax is a bit baroque (and not well documented), so let's work through the example: this is the configuration you need to detect high overall CPU utilization on the main CPU in the box:
The ERM syntax is a bit baroque (and not well documented), so let's work through the example: this is the configuration you need to detect high overall CPU utilization on the main CPU in the box:
resource policy
policy HighGlobalCPU global
system
cpu total
critical rising 95 falling 70 interval 10
major rising 75 falling 50 interval 10
!
user global HighGlobalCPU
And here are the usage/configuration guidelines:
- The whole ERM subsystem is configured under the resource policy section;
- You always have to configure a policy and a user to which the policy applies. In our example, the user is global (as we're measuring the global CPU load);
- The policy we're defining must have the global keyword to indicate we're measuring overall utilization (otherwise you can't attach it to the global user);
- We're measuring the load on the main CPU, so we're configuring the system subsection of the policy (on distributed platforms you could specify slot name to measure utilization on a specific linecard);
- The cpu section selects CPU load measurements. You could measure interrupt load, process load or total CPU load.
- Within each resource section in the policy (in our example, total CPU load on the main system) you can define minor, major and critical thresholds (syslog messages are generated when each threshold is crossed).
- After the policy is defined, it's applied to the global user.
With the CPU load measurement policy defined, the router will generate syslog messages (SYS-4-CPURESRISING) every time the overall CPU load exceeds the specified rising thresholds. When the utilization falls below the falling threshold, the SYS-4-CPURESFALLING syslog message is generated.
This article is part of You've asked for it series.
Please help with ERM configuration below:
!
resource policy
policy C881W-CPU global
system
cpu total
critical rising 50 interval 30 falling 20 interval 10
major rising 35 interval 15 falling 15 interval 20
!
!
!
user global C881W-CPU
!
!
!
Router Cisco 881, IOS ver. 15.0.1M7. This policy don't place syslog message after CPU load to 55-60%.
snmp value for last 5 min.: $ snmpwalk -v2c -c String 1.1.1.1 1.3.6.1.4.1.9.2.1.58.0
SNMPv2-SMI::enterprises.9.2.1.58.0 = INTEGER: 59
Thanks.
process cpu threshold type total rising 75 interval 30 falling 40 interval 10
and CPU load rising to 82% on syslog added this messages:
270601: Nov 18 14:10:34.068: %SYS-1-CPURISINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr): 82%/76%, Top 3 processes(Pid/Util): 75/4%, 151/0%, 98/0%
274323: Nov 18 14:21:04.060: %SYS-1-CPUFALLINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr) 8%/2%.
Process cpu threshold working fine and place syslog messages for rising and falling CPU values. Resource policy don't add any messages.
With the best regards, Alexey
I probaly change IOS version.
Alexey
After i change IOS on my 881 router to 12.4.20T4 version resource policy generate rising syslog message:
003009: Nov 18 22:11:00.588: %SYS-4-CPURESRISING: System is seeing global cpu util 87% at total level more than the configured major limit 35 %
004169: Nov 18 22:13:05.596: %SYS-1-CPURISINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr): 93%/65%, Top 3 processes(Pid/Util): 81/22%, 63/4%, 217/0%
004232: Nov 18 22:13:10.616: %SYS-4-CPURESRISING: System is seeing global cpu util 91% at total level more than the configured critical limit 50 %
004745: Nov 18 22:14:15.606: %SYS-1-CPUFALLINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr) 0%/64%.
011972: Nov 18 22:30:10.620: %SYS-4-CPURESRISING: System is seeing global cpu util 41% at total level more than the configured major limit 35 %
and don't generate falling syslog messages. For smb and branch routers use "process cpu threshold" configuration.
Ivan, thanks for your post and questions.
Alexey
Alexey
:)