This is QoS; who cares about real-time response?

It all started with a innocuous question: can you detect voice traffic with EEM? Looks simple enough: create a QoS class-map that matches voice calls and read the cbQosClassMapStats table in the CISCO-CLASS-BASED-QOS-MIB. The first obstacle was finding the correct indexes, but a Tcl script quickly solved that; I was ready to create the EEM applet. The applet failed to work correctly and after lots of debugging I figured out the counters in the cbQosClassMapStats table change only every 10 seconds.

I couldn’t believe my eyes and simply had to test other MIB variables as well. As expected, the IF-MIB (standard interface MIB) counters increase in real-time, but obviously someone had the bright idea that we need to detect changes in traffic profile only every now and then. Although I've received suggestions from my readers, none of them works on an 1800 or a 7200. Oh, well, Cisco developers from the days when I started working with routers would know better.

To test the MIB variable behavior I wrote a simple Tcl script to test the MIB variables. It reads the specified MIB variable at fixed intervals and prints the values, so you can monitor the changes in the MIB variable in real-time. I started low-bandwidth UDP flood across the router and monitored the output bytes interface counter. As expected the counter changed in real time and accurately tracked the amount of traffic sent through the router.

GW#pm ifOutOctets.3 public 10 1000          
polling ifOutOctets.3 for 10 seconds (10 iterations)

 0.000 ifOutOctets.3=42528679        
 1.000 ifOutOctets.3=42537767        
 2.000 ifOutOctets.3=42546713        
 3.000 ifOutOctets.3=42555719        
 4.000 ifOutOctets.3=42564665        
 5.000 ifOutOctets.3=42573611        
 6.000 ifOutOctets.3=42582699        
 7.000 ifOutOctets.3=42591645        
 8.000 ifOutOctets.3=42600591        
 9.000 ifOutOctets.3=42609537        

Then I created a simple class-map and policy map

GW#show policy-map interface
 FastEthernet1/0 

  Service-policy output: LAN

    Class-map: Voice (match-all)
      20438 packets, 2902196 bytes
      30 second offered rate 70000 bps
      Match: access-group name Voice

    Class-map: class-default (match-any)
      41 packets, 3967 bytes
      30 second offered rate 0 bps, drop rate 0 bps
      Match: any 

… and monitored the pre-policy byte counter (cbQosCMPrePolicyByte64) for the Voice class. The value changed only once every ten seconds:

GW#pm cbQosCMPrePolicyByte64.50.10767521 public 10 1000
polling cbQosCMPrePolicyByte64.50.10767521 for 10 seconds (10 iterations)

 0.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 1.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 2.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 3.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 4.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 5.000 cbQosCMPrePolicyByte64.50.10767521=0x002865366
 6.000 cbQosCMPrePolicyByte64.50.10767521=0x00287acf8
 7.000 cbQosCMPrePolicyByte64.50.10767521=0x00287acf8
 8.000 cbQosCMPrePolicyByte64.50.10767521=0x00287acf8
 9.000 cbQosCMPrePolicyByte64.50.10767521=0x00287acf8

What can I say ... apart from expressing my deepest disappointment :(

14 comments:

  1. On distributed platforms its worse. On Cat6500/7600 even packet counters change every 10 seconds on most interfaces.

    ReplyDelete
  2. I would understand the delay on a distributed platform. After all, the counters are on a different CPU and have to be transferred every now and then. Of course one could implement a smarter on-demand scheme :)

    What really annoys me is that they didn't even think about using real-time counters on single CPU non-distributed non-accelerated platforms. The actual counters are obviously implemented in software and reside in the same memory as the MIB structures ... and still someone had the dilbertesque idea of copying the values between the two structures every 10 seconds.

    ReplyDelete
  3. could that help by any chance?

    R1(config)#event manager applet plop
    R1(config-applet)#event snmp oid xxx poll-interval ?
    <0-4294967295> Enter seconds[.milliseconds] value

    ReplyDelete
  4. @Xavier: Unfortunately, the problem lies in the actual values of the MIB variables, not the way EEM polls them. The values change only every 10 seconds ... and you have to be very careful with the 'event snmp poll-interval' that you don't get false positives.

    ReplyDelete
  5. Can the voice stream be detected by inspecting SIP transactions? When ip inspect sip and ip nat are both enabled the inspection engines is adding UDP translation entries for the RTP/UDP traffic.

    ReplyDelete
  6. There are numerous ways to detect VoIP/SIP traffic. The original solution (follow the links in the article) uses access list logging and EEM applet triggering on the ACL ... but it would be so much cleaner to use the proper SNMP variables.

    ReplyDelete
  7. Hi ivan

    i am new to cisco qos. I have a silly doubt about cbwfq. as i have read that we can only use 75% of the bandwidth for the custom queues as the rest 25% is reserved for the keepalives and routing protocol updates.

    now if i want to set 50% of bandwidth for a particular queue should it 50% of the total bandwidth or 50% of the availble 75% of the bandwidth.

    i couldnt find answer for this anywhere. i would be glad if you can help me out .

    Regards

    sebastan

    ReplyDelete
  8. Andrés Pedro Spano23 July, 2009 16:25

    Sebastan,

    You're correct, if you not put the command Max reserved-bandwith 100 percent. You have 75%. that is considered 100% and then the 50% of that.

    Regards
    Andy

    ReplyDelete
  9. Hello Ivan,
    first of all: thank you for sharing your so creative and precise work. It is a pleasure and I will always reference to you as the original source

    I tried to rebuild a scenario and to check the values because I am also concerned about the correctness of the measurement tools. You write that the counter for a given cbQosCMPrePolicyByte64-counter of an active class i only updated every 10 seconds.
    I found something different (better) which is perhaps due to the ios version I used for testing.
    setting: 2811,
    show policy-map int s0/0/0 | sec SSL
    Class-map: SSL (match-all)
    38614111 packets, 19113935500 bytes
    5 minute offered rate 21000 bps, drop rate 0 bps
    Match: protocol secure-http
    Queueing
    queue limit 64 packets
    (queue depth/total drops/no-buffer drops) 2/11305151/0
    (pkts output/bytes output) 27308959/12934044900
    bandwidth 20% (200 kbps)

    tcl-hack:
    R1#more flash:showcounter3.tcl
    set cnt 0
    set maxcnt 20
    set delay 100
    puts "Gleich beginnt die Counter-Sammlung.\n"
    after 1000
    while {$cnt < $maxcnt } {
    set zaehlarray($cnt) [exec "show policy-map interface s0/0/0 | i 191"]
    set zaehlarray2($cnt) [snmp_getone INSQOSR cbQosCMPrePolicyByte64.50.13986465]
    incr cnt
    if { $cnt != $maxcnt } { after $delay }
    }
    puts "Counter-Sammlung abgeschlossen. Gleich beginnt die Ausgabe."
    after 1000
    set cnt 0
    while {$cnt < $maxcnt } {
    puts $zaehlarray($cnt)
    puts $zaehlarray2($cnt)
    incr cnt
    }

    Result:
    R1#tclsh flash:showcounter3.tcl
    Gleich beginnt die Counter-Sammlung.

    Counter-Sammlung abgeschlossen. Gleich beginnt die Ausgabe.
    38639819 packets, 19115220900 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b1fa4'/>}
    38639826 packets, 19115221250 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2102'/>}
    38639834 packets, 19115221650 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2292'/>}
    38639844 packets, 19115222150 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2486'/>}
    38639853 packets, 19115222600 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2648'/>}
    38639862 packets, 19115223050 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b280a'/>}
    38639868 packets, 19115223350 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2936'/>}
    38639874 packets, 19115223650 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2a62'/>}
    38639881 packets, 19115224000 bytes

    {<obj oid='cbQosCMPrePolicyByte64.50.13986465' val='0x4735b2bc0'/>}
    38639890 packets, 19115224450 bytes

    When I compare the hex-values with the values from the show policy-map-command, they are exact the same.
    What do you think? ios: c2800nm-adventerprisek9-mz.124-24.T1.bin
    regards, William

    ReplyDelete
  10. Ivan Pepelnjak23 August, 2009 13:45

    Here's my wild guess: the "show policy-map" command might also update the SNMP counters.

    Can you redo the tests without the "show policy-map"?

    ReplyDelete
  11. You are absolutely right! Using the same script, just taking out the show-policy-map line and the puts-line for the not used array:

    several runs over 20 seconds:
    I got 4 values: aa-bb-cc-dd where bb and cc where between 8.8 and 9 seconds long

    I was wondering over the involved processes:

    so the one with ONLY snmp-get showed from a second vty during the tcl-run:

    R1#show proc cpu sorted 5sec
    CPU utilization for five seconds: 21%/7%; one minute: 11%; five minutes: 10%
    PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    304 1340 114 11754 11.90% 1.71% 0.36% 514 Tcl Serv - tty51
    179 5436 245958062 0 1.11% 1.20% 1.20% 0 HQF Shaper Backg
    2 724 197138 3 0.07% 0.04% 0.05% 0 Load Meter
    65 188 3942710 0 0.07% 0.02% 0.00% 0 Netclock Backgro
    56 1460 251635 5 0.07% 0.00% 0.00% 0 HC Counter Timer
    180 908 9854235 0 0.07% 0.06% 0.07% 0 RBSCP Background
    42 1108 985723 1 0.07% 0.03% 0.02% 0 Per-Second Jobs
    7 0 2 0 0.00% 0.00% 0.00% 0 Timers
    6 12 16 750 0.00% 0.00% 0.00% 0 Pool Manager

    (second part in second post)

    ReplyDelete
  12. the one with snmp AND show-command (after a while waiting):

    R1#show proc cpu sorted 5sec
    CPU utilization for five seconds: 40%/8%; one minute: 14%; five minutes: 11%
    PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    304 3592 84 42761 31.26% 4.76% 1.02% 514 Tcl Serv - tty51
    179 5448 246005037 0 0.79% 1.15% 1.19% 0 HQF Shaper Backg
    1 208 2419 85 0.15% 0.01% 0.00% 0 Chunk Manager
    41 120 985852 0 0.07% 0.00% 0.00% 0 TTY Background
    65 188 3943482 0 0.07% 0.02% 0.00% 0 Netclock Backgro
    146 1632 1525441 1 0.07% 0.02% 0.00% 0 CEF: IPv4 proces
    180 920 9856122 0 0.07% 0.06% 0.07% 0 RBSCP Background
    38 10316 256052 40 0.07% 0.01% 0.00% 0 Net Background
    33 60 985875 0 0.07% 0.00% 0.00% 0 GraphIt
    190 144 1925419 0 0.07% 0.01% 0.00% 0 CCE DP URLF cach
    258 424 4929352 0 0.07% 0.02% 0.00% 0 Atheros LED Ctro
    11 0 1 0 0.00% 0.00% 0.00% 0 IPC Zone Manager
    12 96 962743 0 0.00% 0.00% 0.00% 0 IPC Periodic Tim

    Interesting is the double amount of total cpu-time. If the shown processes do really reveal anything about which process might be resonsible for the updating of the mib I cannot see.

    Well I totally agree with you: 9-10 seconds of a dead counter in an important monitoring-mib for qos seems not to be good.

    To get precise data what do you suggest? Perhaps all the netmon-vendors have to include a small script in future which executes a show-policy-map-interface-command first before they poll the counter . :-D

    I thought I could get the info out of the device but know I come back to using taps and do the analysis separate.

    regards, William

    ReplyDelete
  13. Jared Valentine20 January, 2010 23:53

    Ivan/William:

    Just following up on this thread. It looks like William has a way to make the snmp update real-time. So my questions are: 1.) can we create a never-ending background task that runs 1 time per second that runs "show policy-map" so that our SNMP values are incrementing in real-time, in which case we could then create an EEM applet that accomplishes what I was originally trying to do?

    Thanks!
    -Jared

    ReplyDelete
  14. I've got a problem with my 2811 router.

    I cannot use any of examples that checks snmp value. When typing "event snmp oid" commands I'm getting "Invalid input detected" pointing at snmp part.

    Everything is ok with my 1841 device. I've checked all 12.4T and 15.1 versions with no luck.

    I've compared "sh event manager version", and I see that in 2811 device there's no "snmp" and "snmp-notification" detectors.

    Why? is this platform 'broken'? 'special'? is there something that I should enable on device to get snmp detectors working?

    I'd be very glad for any help.

    Thanks!

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.