Turn a switch into a hub … the Microsoft way

If you’ve ever tried to get advanced Cisco certifications, you’ve probably encountered questions dealing with the mismatch between the end device ARP timeouts and the L2 switch CAM (MAC address cache) timeouts. If you’re still wondering what the underlying problem is (it took me a while to figure it out), read the Unicast Flooding in Switched Campus Networks document from Cisco.

In all scenarios, traffic sent to unknown unicast MAC address causes layer-2 flooding, which can significantly reduce switch performance. Microsoft took this problem to a completely new level with its Network Load Balancing implementation: Windows servers send ARP replies containing MAC address X from MAC address Y, causing all the traffic toward the servers to be flooded. Dragan has encountered this problem in a large customer network and described his “experience” in Fragments.

3 comments:

  1. Ivan,

    I ran into this recently with folks complaining about poor voice quality. As it turns out, the CPUs on our 6500s were getting pegged. We tracked it down to a couple "clustered" hosts running Microsoft NLB. We eventually moved them onto a DNS-based load balancing solution off our GSLB which met their needs.

    If you read the protocol design on Microsoft Technet, it is truly written by application developers. However, if you absolutely MUST run NLB in your network, definitely go with the multicast option with IGMP snooping to handle any flooding issues. Details here:

    http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml

    -Steve

    ReplyDelete
  2. Carl Von Hassel21 August, 2009 21:17

    We also have ran into issues with the "Microsoft" solution. Not only do the switches suffer from high CPU but every host on the switch takes a CPU hit. We also had to dumb down our IPS because it sees this behavior as an attack.

    ReplyDelete
  3. As Steve Shaw indicates, it's worth reading the Microsoft NLB documentation just so you can shake your head in wonder at the awesome logic of it:

    http://technet.microsoft.com/en-us/library/cc782694%28WS.10%29.aspx

    In unicast mode, the same MAC address is used for ALL cluster members (who are now of course unable to communicate with one another). The alternative is to use multicast mode - which sounds just peachy, but on a single LAN segment there are no membership requests to snoop so the data floods out every port just like in the unicast model.

    Last time I came across this a few years back, my solution was the same as Dragan's - isolation by segmentation. *sigh*

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.