This is QoS; Who Cares about Real-Time Response?

It all started with a innocuous question: can you detect voice traffic with EEM? Looks simple enough: create a QoS class-map that matches voice calls and read the cbQosClassMapStats table in the CISCO-CLASS-BASED-QOS-MIB. The first obstacle was finding the correct indexes, but a Tcl script quickly solved that; I was ready to create the EEM applet. The applet failed to work correctly and after lots of debugging I figured out the counters in the cbQosClassMapStats table change only every 10 seconds.

I couldn’t believe my eyes and simply had to test other MIB variables as well. As expected, the IF-MIB (standard interface MIB) counters increase in real-time, but obviously someone had the bright idea that we need to detect changes in traffic profile only every now and then. Although I've received numerous suggestions from my readers, none of them works on a Cisco 1800 or a Cisco 7200. Oh, well, Cisco developers from the days when I started working with routers would have known better…

To test the MIB variable behavior I wrote a simple Tcl script to test the MIB variables (you’ll find it at the bottom of the blog post).

It reads the specified MIB variable at fixed intervals and prints the values, so you can monitor the changes in the MIB variable in real-time. I started low-bandwidth UDP flood across the router and monitored the output bytes interface counter. As expected the counter changed in real time and accurately tracked the amount of traffic sent through the router.

GW#pm ifOutOctets.3 public 10 1000          
polling ifOutOctets.3 for 10 seconds (10 iterations)

 0.000 ifOutOctets.3=42528679        
 1.000 ifOutOctets.3=42537767        
 2.000 ifOutOctets.3=42546713        
 3.000 ifOutOctets.3=42555719        
 4.000 ifOutOctets.3=42564665        
 5.000 ifOutOctets.3=42573611        
 6.000 ifOutOctets.3=42582699        
 7.000 ifOutOctets.3=42591645        
 8.000 ifOutOctets.3=42600591        
 9.000 ifOutOctets.3=42609537        

Then I created a simple class-map and policy map

GW#show policy-map interface
 FastEthernet1/0 

  Service-policy output: LAN

    Class-map: Voice (match-all)
      20438 packets, 2902196 bytes
      30 second offered rate 70000 bps
      Match: access-group name Voice

    Class-map: class-default (match-any)
      41 packets, 3967 bytes
      30 second offered rate 0 bps, drop rate 0 bps
      Match: any 

… and monitored the pre-policy byte counter (cbQosCMPrePolicyByte64) for the Voice class. The value changed only once every ten seconds:

GW#pm cbQosCMPrePolicyByte64.50.10767521 public 10 1000
polling cbQosCMPrePolicyByte64.50.10767521 for 10 seconds (10 iterations)

 0.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 1.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 2.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 3.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 4.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 5.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 6.000 cbQosCMPrePolicyByte64.50.10767521=0x00287acf8
 7.000 cbQosCMPrePolicyByte64.50.10767521=0x00287acf8
 8.000 cbQosCMPrePolicyByte64.50.10767521=0x00287acf8
 9.000 cbQosCMPrePolicyByte64.50.10767521=0x00287acf8

What can I say… apart from expressing my deepest disappointment :(

Tcl Script Used in this Blog Post

The script continuously polls a MIB variable and prints the value of the specified variable at fixed intervals (default: 500 milliseconds) for the specified amount of time (default: 15 seconds). It can be used to verify whether a MIB variable is changed in real time or periodically.

Installation

  • Download the source file into flash:pm.tcl.
  • Configure alias exec pm tclsh flash:pm.tcl

Usage guidelines

Usage: pm variable [ community [ time [ interval ] ] ]

Command line parameters:

  • Variable: MIB variable to read. Symbolic names recognized by Cisco IOS can be used (for example, ifOutOctets). Names are case-sensitive.
  • Community: SNMP community used in SNMP GET requests (default: public)
  • Time: the measurement time in seconds (default: 15 seconds).
  • Interval: interval between MIB polls (default: 500 milliseconds).

Source code

#
# author:   Ivan Pepelnjak
# title:    Poll a MIB variable
# name:     pollMib.tcl
# desc:     The script continuously polls a MIB variable and prints
#           the values at fixed intervals for the specified amount of time.
#           It can be used to verify whether a MIB variable is changed in
#           real time or periodically.
#
# ios config:
#
#           * download the file into flash:pm.tcl
#           * configure alias exec pm tclsh flash:pm.tcl
#
#           invoke with poll oid [community [time(sec) [interval(msec)]]]
#
#           unless specified otherwise, the "public" community is used
#

proc snmpGet { oid result } {
  global snmpCommunity
  upvar $result r
  if { [info exists r] } { unset r }

  set getResult [ snmp_getone $snmpCommunity $oid ]
  if { [ regexp {snmp error.*text='(.*)'} $getResult ignore errtxt ] } { 
     error "snmpGet - $errtxt"; return 0 }
  if { [ regexp {oid='(.*)'.*val='(.*)'} $getResult ignore oid result ] } {
    if { ! [ string equal $result "NO_SUCH_INSTANCE_EXCEPTION" ] } {
      set r(OID) $oid ;
      set r(VALUE) $result ; 
      return 1;
    }
  }
  return 0;
}

set oid [lindex $argv 0]
set snmpCommunity [lindex $argv 1]
set maxtime [lindex $argv 2]
set delay   [lindex $argv 3]

if { [string equal $snmpCommunity ""] } { set snmpCommunity "public" }
if { [string equal $oid ""] } { 
  puts {Usage: pollMib.tcl oid [ community [ time(sec) [ interval(msec) ]]]}
  return
}
if { [string equal $maxtime ""] } { set maxtime 15 }
if { [string equal $delay ""] } { set delay 500 }

set maxcnt [ expr { $maxtime * 1000 / $delay } ]
set cnt 0

puts "\npolling $oid for $maxtime seconds ($maxcnt iterations)\n"
while { $cnt < $maxcnt } {
  snmpGet $oid getvar
  puts [format "%6.3f %-30s" [ expr { $delay * $cnt / 1000.0 } ] "$getvar(OID)=$getvar(VALUE)"]
  incr cnt
  if { $cnt != $maxcnt } { after $delay }
}

14 comments:

  1. On distributed platforms its worse. On Cat6500/7600 even packet counters change every 10 seconds on most interfaces.
  2. I would understand the delay on a distributed platform. After all, the counters are on a different CPU and have to be transferred every now and then. Of course one could implement a smarter on-demand scheme :)

    What really annoys me is that they didn't even think about using real-time counters on single CPU non-distributed non-accelerated platforms. The actual counters are obviously implemented in software and reside in the same memory as the MIB structures ... and still someone had the dilbertesque idea of copying the values between the two structures every 10 seconds.
  3. could that help by any chance?

    R1(config)#event manager applet plop
    R1(config-applet)#event snmp oid xxx poll-interval ?
    <0-4294967295> Enter seconds[.milliseconds] value
  4. @Xavier: Unfortunately, the problem lies in the actual values of the MIB variables, not the way EEM polls them. The values change only every 10 seconds ... and you have to be very careful with the 'event snmp poll-interval' that you don't get false positives.
  5. Can the voice stream be detected by inspecting SIP transactions? When ip inspect sip and ip nat are both enabled the inspection engines is adding UDP translation entries for the RTP/UDP traffic.
  6. There are numerous ways to detect VoIP/SIP traffic. The original solution (follow the links in the article) uses access list logging and EEM applet triggering on the ACL ... but it would be so much cleaner to use the proper SNMP variables.
  7. Hi ivan

    i am new to cisco qos. I have a silly doubt about cbwfq. as i have read that we can only use 75% of the bandwidth for the custom queues as the rest 25% is reserved for the keepalives and routing protocol updates.

    now if i want to set 50% of bandwidth for a particular queue should it 50% of the total bandwidth or 50% of the availble 75% of the bandwidth.

    i couldnt find answer for this anywhere. i would be glad if you can help me out .

    Regards

    sebastan
  8. Sebastan,

    You're correct, if you not put the command Max reserved-bandwith 100 percent. You have 75%. that is considered 100% and then the 50% of that.

    Regards
    Andy
  9. Hello Ivan,
    first of all: thank you for sharing your so creative and precise work. It is a pleasure and I will always reference to you as the original source

    I tried to rebuild a scenario and to check the values because I am also concerned about the correctness of the measurement tools. You write that the counter for a given cbQosCMPrePolicyByte64-counter of an active class i only updated every 10 seconds.
    I found something different (better) which is perhaps due to the ios version I used for testing.
    setting: 2811,
    show policy-map int s0/0/0 | sec SSL
    Class-map: SSL (match-all)
    38614111 packets, 19113935500 bytes
    5 minute offered rate 21000 bps, drop rate 0 bps
    Match: protocol secure-http
    Queueing
    queue limit 64 packets
    (queue depth/total drops/no-buffer drops) 2/11305151/0
    (pkts output/bytes output) 27308959/12934044900
    bandwidth 20% (200 kbps)

    tcl-hack:
    R1#more flash:showcounter3.tcl
    set cnt 0
    set maxcnt 20
    set delay 100
    puts "Gleich beginnt die Counter-Sammlung.\n"
    after 1000
    while {$cnt < $maxcnt } {
    set zaehlarray($cnt) [exec "show policy-map interface s0/0/0 | i 191"]
    set zaehlarray2($cnt) [snmp_getone INSQOSR cbQosCMPrePolicyByte64.50.13986465]
    incr cnt
    if { $cnt != $maxcnt } { after $delay }
    }
    puts "Counter-Sammlung abgeschlossen. Gleich beginnt die Ausgabe."
    after 1000
    set cnt 0
    while {$cnt < $maxcnt } {
    puts $zaehlarray($cnt)
    puts $zaehlarray2($cnt)
    incr cnt
    }

    Result:
    R1#tclsh flash:showcounter3.tcl
    Gleich beginnt die Counter-Sammlung.

    Counter-Sammlung abgeschlossen. Gleich beginnt die Ausgabe.
    38639819 packets, 19115220900 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b1fa4'/>}
    38639826 packets, 19115221250 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2102'/>}
    38639834 packets, 19115221650 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2292'/>}
    38639844 packets, 19115222150 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2486'/>}
    38639853 packets, 19115222600 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2648'/>}
    38639862 packets, 19115223050 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b280a'/>}
    38639868 packets, 19115223350 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2936'/>}
    38639874 packets, 19115223650 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2a62'/>}
    38639881 packets, 19115224000 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2bc0'/>}
    38639890 packets, 19115224450 bytes

    When I compare the hex-values with the values from the show policy-map-command, they are exact the same.
    What do you think? ios: c2800nm-adventerprisek9-mz.124-24.T1.bin
    regards, William
  10. Here's my wild guess: the "show policy-map" command might also update the SNMP counters.

    Can you redo the tests without the "show policy-map"?
  11. You are absolutely right! Using the same script, just taking out the show-policy-map line and the puts-line for the not used array:

    several runs over 20 seconds:
    I got 4 values: aa-bb-cc-dd where bb and cc where between 8.8 and 9 seconds long

    I was wondering over the involved processes:

    so the one with ONLY snmp-get showed from a second vty during the tcl-run:

    R1#show proc cpu sorted 5sec
    CPU utilization for five seconds: 21%/7%; one minute: 11%; five minutes: 10%
    PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    304 1340 114 11754 11.90% 1.71% 0.36% 514 Tcl Serv - tty51
    179 5436 245958062 0 1.11% 1.20% 1.20% 0 HQF Shaper Backg
    2 724 197138 3 0.07% 0.04% 0.05% 0 Load Meter
    65 188 3942710 0 0.07% 0.02% 0.00% 0 Netclock Backgro
    56 1460 251635 5 0.07% 0.00% 0.00% 0 HC Counter Timer
    180 908 9854235 0 0.07% 0.06% 0.07% 0 RBSCP Background
    42 1108 985723 1 0.07% 0.03% 0.02% 0 Per-Second Jobs
    7 0 2 0 0.00% 0.00% 0.00% 0 Timers
    6 12 16 750 0.00% 0.00% 0.00% 0 Pool Manager

    (second part in second post)
  12. the one with snmp AND show-command (after a while waiting):

    R1#show proc cpu sorted 5sec
    CPU utilization for five seconds: 40%/8%; one minute: 14%; five minutes: 11%
    PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    304 3592 84 42761 31.26% 4.76% 1.02% 514 Tcl Serv - tty51
    179 5448 246005037 0 0.79% 1.15% 1.19% 0 HQF Shaper Backg
    1 208 2419 85 0.15% 0.01% 0.00% 0 Chunk Manager
    41 120 985852 0 0.07% 0.00% 0.00% 0 TTY Background
    65 188 3943482 0 0.07% 0.02% 0.00% 0 Netclock Backgro
    146 1632 1525441 1 0.07% 0.02% 0.00% 0 CEF: IPv4 proces
    180 920 9856122 0 0.07% 0.06% 0.07% 0 RBSCP Background
    38 10316 256052 40 0.07% 0.01% 0.00% 0 Net Background
    33 60 985875 0 0.07% 0.00% 0.00% 0 GraphIt
    190 144 1925419 0 0.07% 0.01% 0.00% 0 CCE DP URLF cach
    258 424 4929352 0 0.07% 0.02% 0.00% 0 Atheros LED Ctro
    11 0 1 0 0.00% 0.00% 0.00% 0 IPC Zone Manager
    12 96 962743 0 0.00% 0.00% 0.00% 0 IPC Periodic Tim

    Interesting is the double amount of total cpu-time. If the shown processes do really reveal anything about which process might be resonsible for the updating of the mib I cannot see.

    Well I totally agree with you: 9-10 seconds of a dead counter in an important monitoring-mib for qos seems not to be good.

    To get precise data what do you suggest? Perhaps all the netmon-vendors have to include a small script in future which executes a show-policy-map-interface-command first before they poll the counter . :-D

    I thought I could get the info out of the device but know I come back to using taps and do the analysis separate.

    regards, William
  13. Ivan/William:

    Just following up on this thread. It looks like William has a way to make the snmp update real-time. So my questions are: 1.) can we create a never-ending background task that runs 1 time per second that runs "show policy-map" so that our SNMP values are incrementing in real-time, in which case we could then create an EEM applet that accomplishes what I was originally trying to do?

    Thanks!
    -Jared
  14. I've got a problem with my 2811 router.

    I cannot use any of examples that checks snmp value. When typing "event snmp oid" commands I'm getting "Invalid input detected" pointing at snmp part.

    Everything is ok with my 1841 device. I've checked all 12.4T and 15.1 versions with no luck.

    I've compared "sh event manager version", and I see that in 2811 device there's no "snmp" and "snmp-notification" detectors.

    Why? is this platform 'broken'? 'special'? is there something that I should enable on device to get snmp detectors working?

    I'd be very glad for any help.

    Thanks!
Add comment
Sidebar