CEF load sharing details

I had to investigate the details of CEF load sharing for one of my upcoming article and found (yet again) that the details are rather undocumented in official documentation. So, this is how it works (in case you ever need to know):

  • For every CEF entry (IP route) where there are multiple paths to the destination, the router creates a 16-row hash table, populating the entries with pointers to individual paths. The hash table can be inspected with the show ip cef prefix internal command.
  • The load balancing ratio is approxiated by number of entries in the hash table belonging to each path. If you have unequal-cost load balancing (EIGRP based on composite metrics and MPLS TE tunnels based on requested bandwidth), individual paths will be associated with different number of rows.
  • If you configure per-destination load balancing, the source and destination IP address in the incoming IP packet are hashed into a 4-bit value that selects the outgoing path in the CEF has table.

If this sounds confusing, here are two examples to make it easier: if you have two equal-cost paths to the same destination, each path will have eight entries in the hash table.

a1#show ip route 192.168.0.0
Routing entry for 192.168.0.0 255.255.255.0
Known via "ospf 1", distance 110, metric 51, type intra area
Last update from 172.16.0.21 on Serial0/0/0.100, 00:00:05 ago
Routing Descriptor Blocks:
* 172.16.0.21, from 172.16.0.22, 00:00:05 ago, via Serial0/0/0.100
Route metric is 51, traffic share count is 1
172.16.0.21, from 172.16.0.22, 00:00:05 ago, via Serial0/0/0.200
Route metric is 51, traffic share count is 1
a1#show ip cef 192.168.0.0 internal
192.168.0.0/24, version 33, epoch 0, per-destination sharing
0 packets, 0 bytes
via 172.16.0.21, Serial0/0/0.100, 0 dependencies
traffic share 1
next hop 172.16.0.21, Serial0/0/0.100
valid adjacency
via 172.16.0.21, Serial0/0/0.200, 0 dependencies
traffic share 1
next hop 172.16.0.21, Serial0/0/0.200
valid adjacency

0 packets, 0 bytes switched through the prefix
tmstats: external 0 packets, 0 bytes
internal 0 packets, 0 bytes
Load distribution: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (refcount 1)

Hash OK Interface Address Packets
1 Y Serial0/0/0.100 point2point 0
2 Y Serial0/0/0.200 point2point 0
3 Y Serial0/0/0.100 point2point 0
4 Y Serial0/0/0.200 point2point 0
5 Y Serial0/0/0.100 point2point 0
6 Y Serial0/0/0.200 point2point 0
7 Y Serial0/0/0.100 point2point 0
8 Y Serial0/0/0.200 point2point 0
9 Y Serial0/0/0.100 point2point 0
10 Y Serial0/0/0.200 point2point 0
11 Y Serial0/0/0.100 point2point 0
12 Y Serial0/0/0.200 point2point 0
13 Y Serial0/0/0.100 point2point 0
14 Y Serial0/0/0.200 point2point 0
15 Y Serial0/0/0.100 point2point 0
16 Y Serial0/0/0.200 point2point 0

However, if you have three equal-cost paths to the destination, each path will have only five entries and the hash table will have 15 rows instead of 16.

a1#show ip route 192.168.0.0
Routing entry for 192.168.0.0 255.255.255.0
Known via "ospf 1", distance 110, metric 51, type intra area
Last update from 10.0.0.6 on FastEthernet0/0, 00:00:02 ago
Routing Descriptor Blocks:
* 172.16.0.21, from 172.16.0.22, 00:00:02 ago, via Serial0/0/0.100
Route metric is 51, traffic share count is 1
172.16.0.21, from 172.16.0.22, 00:00:02 ago, via Serial0/0/0.200
Route metric is 51, traffic share count is 1
10.0.0.6, from 172.16.0.22, 00:00:02 ago, via FastEthernet0/0
Route metric is 51, traffic share count is 1
a1#show ip cef 192.168.0.0 internal
192.168.0.0/24, version 44, epoch 0, per-destination sharing
0 packets, 0 bytes
via 172.16.0.21, Serial0/0/0.100, 0 dependencies
traffic share 1
next hop 172.16.0.21, Serial0/0/0.100
valid adjacency
via 172.16.0.21, Serial0/0/0.200, 0 dependencies
traffic share 1
next hop 172.16.0.21, Serial0/0/0.200
valid adjacency
via 10.0.0.6, FastEthernet0/0, 0 dependencies
traffic share 1
next hop 10.0.0.6, FastEthernet0/0
valid adjacency

0 packets, 0 bytes switched through the prefix
tmstats: external 0 packets, 0 bytes
internal 0 packets, 0 bytes
Load distribution: 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 (refcount 1)

Hash OK Interface Address Packets
1 Y Serial0/0/0.100 point2point 0
2 Y Serial0/0/0.200 point2point 0
3 Y FastEthernet0/0 10.0.0.6 0
4 Y Serial0/0/0.100 point2point 0
5 Y Serial0/0/0.200 point2point 0
6 Y FastEthernet0/0 10.0.0.6 0
7 Y Serial0/0/0.100 point2point 0
8 Y Serial0/0/0.200 point2point 0
9 Y FastEthernet0/0 10.0.0.6 0
10 Y Serial0/0/0.100 point2point 0
11 Y Serial0/0/0.200 point2point 0
12 Y FastEthernet0/0 10.0.0.6 0
13 Y Serial0/0/0.100 point2point 0
14 Y Serial0/0/0.200 point2point 0
15 Y FastEthernet0/0 10.0.0.6 0

12 comments:

  1. I have a 7204 with IOS 12.2 and 'show ip cef A.B.C.D internal' is not available. Do you know any equivalent for it? I'm interested in the 'Load distribution' info, or something equivalent to it.

    ReplyDelete
  2. Upgrade to 12.2SRC ;)

    ReplyDelete
  3. Hello, Ivan!

    I just wanted to share with you what I've found today, which is strange behavior of IOS in my opinion.

    Today I was analyzing a traffic flow for one of our customers, when I had to check the information in CEF table regarding 0.0.0.0/32 prefix. I was curious to get that information from CEF table because there is multipath BGP load balance:

    tbirouter#show ip bgp 0.0.0.0/0
    BGP routing table entry for 0.0.0.0/0, version 1774830
    Paths: (2 available, best #1, table Default-IP-Routing-Table)
    Multipath: eBGP
    Not advertised to any peer
    1234
    Origin IGP, localpref 100, valid, external, multipath, best
    5678
    Origin IGP, localpref 100, valid, external, multipath

    I decided to check what does CEF table says when I tried with these commands:

    tbirouter#show ip cef 0.0.0.0 det
    0.0.0.0/32, version 1, epoch 0, receive
    tbirouter#show ip cef 0.0.0.0 int
    0.0.0.0/32, version 1, epoch 0, receive
    tbirouter#show ip cef 0.0.0.0/32 int
    ^
    % Invalid input detected at '^' marker.

    And here it comes the most interesting part of the story - my typo command which get me exactly what I needed.

    show ip cef 0.0.0.032 internal <------------------ 0.0.0.032
    0.0.0.0/0, version 1303100, epoch 0, per-destination sharing
    0 packets, 0 bytes
    via 2.2.2.20 0 dependencies, recursive
    traffic share 1
    next hop 2.2.2.2, GigabitEthernet0/0.3467 via 2.2.2.2/32
    valid adjacency
    via 1.1.1.1, 0 dependencies, recursive
    traffic share 1
    next hop 1.1.1.10, GigabitEthernet0/0.3197 via 1.1.1.1/32
    valid adjacency

    0 packets, 0 bytes switched through the prefix
    tmstats: external 0 packets, 0 bytes
    internal 0 packets, 0 bytes
    Load distribution: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (refcount 1)

    Hash OK Interface Address Packets
    1 Y GigabitEthernet0/0.3467 2.2.2.2 0
    2 Y GigabitEthernet0/0.3197 1.1.1.1 0
    3 Y GigabitEthernet0/0.3467 2.2.2.2 0
    4 Y GigabitEthernet0/0.3197 1.1.1.1 0
    5 Y GigabitEthernet0/0.3467 2.2.2.2 0
    6 Y GigabitEthernet0/0.3197 1.1.1.1 0
    7 Y GigabitEthernet0/0.3467 2.2.2.2 0
    8 Y GigabitEthernet0/0.3197 1.1.1.1 0
    9 Y GigabitEthernet0/0.3467 2.2.2.2 0
    10 Y GigabitEthernet0/0.3197 1.1.1.1 0
    11 Y GigabitEthernet0/0.3467 2.2.2.2 0
    12 Y GigabitEthernet0/0.3197 1.1.1.1 0
    13 Y GigabitEthernet0/0.3467 2.2.2.2 0
    14 Y GigabitEthernet0/0.3197 1.1.1.1 0
    15 Y GigabitEthernet0/0.3467 2.2.2.2 0
    16 Y GigabitEthernet0/0.3197 1.1.1.1 0
    refcount 718612, covered prefixes:

    Have you noticed this?

    Kind regards,
    Dani Petrov

    ReplyDelete
  4. Well, your problem is (very probably) the discrepancy between what the router prints out (0.0.0.0/32) and what it wants to get (0.0.0.0/0). When you've typed in 0.0.0.032, the router understood that to be 0.0.0.0/0 (due to the subnet mask on 0.0.0.0).

    I agree, it's confusing :-E

    ReplyDelete
  5. Any idea how the load balacing is done in mpls network at 1) ingress and 2) transit?

    ReplyDelete
  6. Greetings,

    I am studying MPLS these days and there is a very basic thing which I am confused with ..I tried to find the details on Google but didnt get any good link ...Can somone plz explain me whats the difference between a ROUTING TABLE AND FIB ?

    I would really be very obligied if somone can help :-[

    ReplyDelete
  7. Google found me this link (from my blog):

    http://blog.ioshints.info/2010/09/ribs-and-fibs.html

    The same topic is also described in my MPLS/VPN Architectures book.

    ReplyDelete
  8. Thanks a lot Ivan..That link is really great .....So that means FIB contains the actual next-hop IP and the exit interface to reach any network whereas RIB just have the next hop IP [ which can be some ip which is not directly connected to the router] ....FIB is faster than RIB becoz it doesnt do the recursive lookup........Plz correct me if I am wrong anywhere !!

    ReplyDelete
  9. Very informative. I wonder why true load balancing (non fixed hash position in a path table) isn't implemented for both L2, e.g. etherchannels, and CEF. The way it's implemented is more deterministic, but can be perceived as a bottle neck for high throughput flows.

    Though, thinking about in-order-delivery, I suppose fixing a flow to a specific path is a good thing. But what are the real risks, and incidence probability, if per packet load balancing is used.

    ReplyDelete
  10. Some non-TCP applications cannot handle out-of order packets. Typical examples: VoIP and FCoE (or SNA if you're really old :D )

    ReplyDelete
  11. Can you give some idea on a particular bucket assigned for the interface will remain to that interface upto what time? if 6 Bucket assgined to 1 link and that link goes down will that bucket imediatelly asigned to another parallel link or 2nd second link not be able to use that buckets

    say buckets 0 2 4 6 8 10 12 14 are for interface 1
    and buckets 1 3 5 7 9 11 13 15 are for interface 2

    if interface 2 goes down, then buckets will be freed and assigned to interface 1 or that buckets will not be utilized. if buckets are freed after how much time will it allocate to interface 2 .... or what is the bucket refresh time !

    can u led me to source/rfc for the same !

    ReplyDelete
    Replies
    1. How about starting a router and testing it? There's no RFC on CEF, it's Cisco's proprietary forwarding implementation.

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.