Migrating a Data Center Fabric to VXLAN

Darko Petrovic made an excellent remark on one of my LinkedIn posts:

The majority of the networks running now in the Enterprise are on traditional VLANs, and the migration paths are limited. Really limited. How will a business transition from traditional to whatever is next?

The only sane choice I found so far in the data center environment (and I know it has been embraced by many organizations facing that conundrum) is to build a parallel fabric (preferably when the organization is doing a server refresh) and connect the new fabric with the old one with a layer-3 link (in the ideal world) or an MLAG link bundle.

Trying to migrate an existing fabric to a new technology is usually pointless; old hardware either doesn’t support the technology you’re interested in or has so many limitations1 that it’s simply not worth the effort2.

Not everyone agrees that it makes sense to build a new fabric; you might decide that upgrading the existing data center fabric by layers makes more sense. For example, you could rip out the spine switches (or whatever they were called when you bought them), replace them with new hardware, enable VXLAN on that new hardware, and then gradually replace leaf switches with VXLAN-enabled ones, effectively slowly expanding the VXLAN boundaries. That could work (in theory), but you’d be facing many migration events and a particularly nasty caveat: link speeds.

If your existing switches are a few years old, they might have 40GE uplinks (or, even worse, 10GE uplinks). Will you buy low-end spine switches to cope with that, or will you purchase modern switches and waste most of their performance to deal with the low-speed links? Also, most businesses’ IT needs grow slower than Moore’s law or Ethernet speeds, so most fabrics tend to shrink over time. Buying enough spine switches to connect the existing leaf switches is a waste when you know you won’t need nearly as many leaf switches in the future.

The same reasoning applies to a hardware migration strategy that starts with the leaf switches: how will you connect the high-speed new leaf switches to the ancient spines?

Finally, Darko mentioned customers who wanted to use shiny new stuff without investing anything into the hardware or migration services (the whole thread is well worth reading). The only (sane) way to deal with them is to explain that there is no free lunch and walk away if they fail to grasp that message. Sometimes you can’t win.


  1. For example, routing in and out of VXLAN tunnels (RIOT) ↩︎

  2. The same rule applies to “Gee, we have this old server. I’m pretty sure I could run $whatever on it” I made that mistake more than once and wasted way too much time trying to make new software work on unreliable old hardware. ↩︎

2 comments:

  1. I have done some migrations on existing hardware, moving from one technology to another. Often related to ACI: either sone enterprises who bought Nexus 9K switches where transitioning to ACI or SDN was mentioned as a benefit during the sales cycle, or enterprises who got ACI form the start but are not using any of its distinguishing features and wanted to get rid of the added ‘complexity’.

    The transition strategy always used a ‘seed’ network, using new spine switches and at least a single pair of leaf switches (of each interface ‘type’). Which is L2 connected to the existing environment. The idea is to free up the first pair of leaf switches by moving the workloads towards the new ‘seed’ leaf switches. Next the original leaf switches can be converted or re-configured and added to the new ‘seed’ network. This process is repeated till all workloads are moved. At a certain point also the L3 functionality must also be converted to the new network, mostly done at the end or after moving 50% of workloads.

    For sure this can be a lengthy process (depending on fabric size and how many new leaf ‘seed’ switches can be purchased), but by using virtualization or application high availability to your advantage, it can be greatly de-risked.

    Unless the pain points of a certain technology are enormous, mostly some strategic planning is done along the way, a certain type of leaf switches are nearing EOL, or the server team requires higher bandwidth interfaces. These initiatives make it easier to defend this approach towards the business.

    From my experience, transitioning technologies on existing hardware is rather rare, adopting new technologies often is done during hardware lifecycle, where all hardware is replaced by newer models, supporting the latest and greatest features. Migration approaches are similar, but you have full new fabric available, which eases the transition.

  2. The benefit of VXLAN is mostly scalability so if your enterprise network is not scaling... just don't. The migration path from VLANs is to just keep using VLANs. The (vendor-driven) networking industry has a huge blind spot about this.

    Replies
    1. Thank you for mentioning this in your blog, but I must explain the reasoning behind my question a bit more.

      When Broadcom decided to buy VMware, many Enterprises/MSPs didn't expect a massive cost surge, namely NSX-T (in some cases, depending on the licensing/features, 100%). My latest case(s) are from companies that purchased brand-new Nexuses (and QFXs) at the beginning of 2023 just to serve as an underlay for the NSX-T. So, they are now in a bit of trouble with the high costs of NSX-T, and they want to avoid proceeding with it. Alternative - move away from NSX-T and return the gateways to the switches (Cisco, Junipers, Aristas.. etc.).

      Enter 'Darko mentioned customers who wanted to use shiny new stuff without investing anything into the hardware or migration services' They have already spent hundreds of thousands/millions of dollars on the new gear. They are so profoundly committed to the NSX-T technology that there are no easy choices regarding migration scenarios: either be a hostage or bite the bullet.

      I agree that technology migration needs to be carefully planned/executed; that's what good CTOs do, but there are cases where you are blindsided, and your hands are tied.

      Cheers

    2. Hey, you should have mentioned the real problem (= ridiculous Broadcom pricing and getting off NSX-T) at the very beginning, and we'd have a completely different conversation.

      Will write another blog post on the topic, asking for suggestions.

    3. I'm afraid I have to disagree. The reason is not a core problem. Since you have way more experience under your belt than I do, let me ask you like this: In the 'old days' when there was some epic technology breakthrough, what were migration options? There wasn't much of a breakthrough in the DC before Vxlan. MPLS has been here since around the mid-90s. One would argue that the vxlan/evpn is a ground-breaking technology with no viable migration path or workaround.

      Cheers

    4. > In the 'old days' when there was some epic technology breakthrough, what were migration options?

      Build parallel infrastructure (physical or virtual). How do you think we got from IPX to IP, from IPv4 to IPv6, or from ACLs to MPLS/VPN (people were running MPLS over GRE tunnels until they migrated the core network).

Add comment
Sidebar