Do we need LACP and UDLD?

The Nexus-focused Packet Pushers were discussing a great question during Cisco Nexus Deep Dive part 2 podcast: do we need LACP on top of UDLD?

Short answer: absolutely.

LACP and UDLD serve two different functions:

  • UDLD detects physical link errors and byzantine failures (example: unidirectional fiber link);
  • LACP manages the link aggregation groups (LAG, aka port channel) and detects LAG configuration and wiring errors.

Example: LACP can detect a miswired port channel connected to multiple physical switches (that’s why we need MLAG). UDLD can’t do that.

LACP is thus mandatory in a robust network using link aggregation (and after years of yammering, it finally works in vSphere 5.1).

Unfortunately, you can’t tune the LACP timers or timeout values. The 802.1AX standard defines two timer values: short timer (1 second) or long timer (default, 30 seconds) with corresponding timeouts being 3 seconds or 90 seconds.

You can tune UDLD timeouts, with the valid values (in Nexus OS) being between 7 and 90 seconds.

Summary:

  • If you need very fast failure detection, use LACP short timers.
  • If you need to detect failures within 10-20 seconds, use UDLD.
  • Use UDLD (if needed) in combination with LACP on port channels.
  • Never ever run port channels without LACP (unless you’re forced to interact with a lobotomized device).

17 comments:

  1. Why would one even question that? Try connection a port-channel to a stack of switches without LACP, then see a stack member failing. You'll learn fast.
    Replies
    1. What are the symptoms? Etherchannel failure/flap?
    2. Depending on the failure, the stack may split. If LACP is being used, the peer switch detects the stack split, or just the lack of LACP frames, and forms two separate port-channels, after which STP kicks in and blocks one link.
      If it's a static port-channel, it does not detect the stack split and keeps on forwarding on one port-channels towards two logical devices, and loops can form.
  2. I've struggled to understand why we even need UDLD. 10GigE and GigE have detection mechanisms built into it. I've tested with and without UDLD on fiber links and the results were the same, which is that the links go down when the link is uni-directional.

    I believe there is a requirement to have the links in auto-negotiation mode, but that is 99% of the implementations out there. On the 1% that don't do auto-negotiation this is typically to a carrier who wouldn't run UDLD to the CE anyways.

    Perhaps it was needed on FastE back in the day??

    I've been searching as to why this is needed. Many other vendors don't even have a feature like this and don't intend to implement it because of the reason I stated above, it's built into the GigE standards.
    Replies
    1. I'm hoping Ivan will comment here. I wasn't aware of the above if true.
    2. Hi Anonymous, you are referring to link fault signaling(LFS). See this page for more information about UDLD and LFS: http://en.wikipedia.org/wiki/Unidirectional_Link_Detection
    3. Even I would like to know why UDLD when LACP can do the job in more faster way ?
  3. Anony, you might be confusing copper and fiber?

    This doesn't affect Copper.
    goog hit #1:
    The various fiber optic Ethernet standards (10 Mbps 10BASE-F, 100 Mbps 100BASE-FX and 1000 Mbps 1000BASE-X) use different wavelengths of optical signaling which made it impossible to come up with an Auto-Negotiation signaling system that would work across all three.

    As such UDLD is/was needed.
  4. I thought the question was: If you use LACP, do you need UDLD on LACP ports?

    Will LACP be be enough to detect unidirectional links?

    IMHO, LACP should be enough, and there is no need for UDLD on LACP ports.
    Replies
    1. LACP will detect unidirectional links. Will it detect them fast enough? That's where you enter the gray land of It Depends.

      If you're OK with 90+ second time-to-recovery or can turn on LACP short timers (= they're supported by both ends of the link), you're all set ... unless running LACP with short timers overloads the 8088 CPU your vendor put in a 500-port switch (just joking, but do check the CPU utilization).
    2. LACP is one way that Juniper recommends to do UDLD simulation.

      http://kb.juniper.net/InfoCenter/index?page=content&id=KB13314
  5. What about a TRILL (FabricPath) setup? You can make all your aggregated links just point-to-point links, then you do not need LACP at all. Use this in combination with link fault signaling (LFS) and you do not need UDLD/LACP??????
    Replies
    1. If you do not have bundles, you should not need UDLD with FP. There is FP ISIS to take care of any soft failures and its bidirectional protocol, just like all L3 protocols are.
  6. UDLD it's proprietary solution, EFM do the same thing, but with additinoal functionality. IMHO
  7. Hi Ian
    This is a great article, however what if we put question another way - do we need UDLD on top of LACP?
    Replies
    1. As I said in the article - if you can't run LACP with short timers (for whatever reason), UDLD gives you a faster alternative.
    2. It potentially gives you a faster response using Cisco proprietary technologies. I always find it fascinating how individuals with almost exclusive Cisco backgrounds think this way.

      IMHO, if the vendor can't support an easy 1 second timer (and let's assume they have clock randomization so the timers don't all sync lock, which adds more overhead) across 100s of ports because the CPU they are using (for numerous functions) is the cheapest one they could find, find another vendor.
Add comment
Sidebar