Nexus 6000 and 40GE – why do I care?

Cisco launched two new data center switches on Monday: Nexus 6001, a 1RU ToR switch with the exact same port configuration as any other ToR switch on the market (48 x 10GE, 4 x 40GE usable as 16 x 10GE) and Nexus 6004, a monster spine switch with 96 40GE ports (it has the same bandwidth as Arista’s 7508 in a 4RU form factor and three times as many 40GE ports as Dell Force10 Z9000).

Apart from slightly higher port density, Nexus 6001 looks almost like Nexus 5548 (which has 48 10GE ports) or Nexus 3064X. So where’s the beef?

For starters, Nexus 6001 supports FCoE and numerous other goodies that Nexus 3064X doesn’t (FabricPath, Adapter- and VM-FEX). If you plan to build an end-to-end FCoE network, you just might want to have 40GE links in the core, but even if you’re past the Fiber Channel stage of data center network evolution and use iSCSI or NFS, it won’t hurt to have full DCB support throughout your network.

Both switches in the Nexus 6000 family have an interesting forwarding table concept: 256K entries are shared between MAC, ARP, ND and (*,G) entries. By default, the MAC address table has 128K entries, with the other 128K being available for L3 entries, but you can move the boundary if you have to. Oh, BTW, IPv6 entries use twice as much host table space as IPv4 entries (not surprisingly), for a maximum of ~50K dual-stack entries.

However, 40GE support is the most important part of the new launch for me. While 40GE uses the same four multimode fiber pairs like four 10GE links, and has slightly higher bandwidth density (because a single 40GE connector uses less space than four 10GE connectors), the real difference lies in the way you can use the four pairs. You have to combine four 10GE links into a port channel (aka Link Aggregation Group) or use ECMP routing to get the same bandwidth as a single 40GE link ... but even then you don’t get the same bandwidth due to port channel/ECMP load sharing limitations (exception: Brocade VCS Fabric). Having four 40GE uplinks is definitely much better than having 16 10GE uplinks.

Next, if you plan to build large data center fabrics, you must consider the Nexus 6004. You can use it together with almost any 10/40GE ToR switch to build leaf-and-spine Clos fabrics with over 4500 ports without resorting to 10GE breakout cables or multi-stage Clos fabrics ... and since Nexus 6004 supports FabricPath, you could use this fabric in L2 or L3 mode if you use Nexus 6001 as the ToR switch (not that I would ever recommend having large L2 fabrics).

Here’s the math: 1:3 end-to-end oversubscription ratio, four spine switches with 96 40GE ports each, 96 leaf switches with a 40GE uplink to each spine switch. Each leaf switch has 48 10GE ports; 48 x 96 = 4608.

Finally, Cisco announced a new FEX: 2248PQ-10G with 48 10GE ports and four 40GE uplinks. You can use two Nexus 6004 switches and the new fabric extenders to build a ~2300-port fabric with just two managed nodes (as long as you don’t care that the total switching capacity of the fabric is “only” around 15 Tbps because all switching is done in the spine switches).

Here’s the math: With 1:3 oversubscription ratio, each FEX has two 40GE uplinks to a pair of 6004 switches, resulting in a maximum of 48 x 2248PQ-10G, each with 48 10GE ports. 48 x 48 = 2304. Might be that the actual number is lower due to FEX scalability limitations – it will be a while before NX-OS Verified Scalability Guide will be published for Nexus 6000.

More details

Clos fabric architectures are described in (no surprise there) Clos Fabrics Explained webinar; you might also watch the Data Center 3.0 for networking engineers webinar to learn more about other data center technologies. Finally, you might want to register for May Data Center Fabrics update session (which will definitely include the Nexus 6000 switches).

19 comments:

  1. Do you have any idea of pricing nexus 6000 swtiches?

    ReplyDelete
    Replies
    1. Sure. You go and talk with your local Cisco AM.

      Delete
    2. I thought you were suppose to care?

      Delete
  2. Ivan, awesome post on the 6000. I love it when Cisco comes out with new equipment and get to read high level bloggers feedback. I understand that 40GbE can be configured as a break-out (4 10Gb) but I was also under the impression 40GbE uplinks can be configured as a single, 40Gb uplink - a muxed lane at layer 1, so you did not have to worry about a port channel.

    Is this not the case on any equipment or just the 6000?

    ReplyDelete
    Replies
    1. Of course you can use the 40GE port as a single 40GE uplink ... but you might need more. For 48 10GE server-facing ports you'd usually take four 40GE uplinks (for 3:1 oversubscription).

      Delete
  3. "Having four 40GE uplinks is definitely much better than having 16 10GE uplinks." --- Shouldn't the statement be, it always depends :) Just in case it is a large environment and a 4-way spine isn't big enough, then 10GE may work out to be better.

    ReplyDelete
    Replies
    1. Nexus 6000 with 4-way spine gets you to ~4500 10GE ports. That should do for most data centers ;)

      Delete
  4. "bandwidth density" - thank you for this phrase!

    Also, s/strands/pairs/g

    ReplyDelete
  5. Excellent post.

    What are your thoughts on what an acceptable over-subscription ratio is? OK, I know that's a loaded question and am half expecting a 'how long is a piece of string' answer. Is it as 'simple' as, more east\west=lower ratio required, more north\south=higher ratio acceptable?

    ReplyDelete
    Replies
    1. Do you prefer "how many angels can dance on the head of a pin" or "it depends" ;) It's a good questions that I was struggling with quite often - time to write a blog post.

      Delete
    2. You mean this one?

      http://blog.ioshints.info/2013/02/the-saga-of-oversubscriptions.html

      ;-)

      By the way, I think I got my points the wrong way around above:

      Is it as 'simple' as, more east\west=lower ratio required, more north\south=higher ratio acceptable?

      That should be:

      Is it as 'simple' as, more east\west=higher ratio acceptable, more north\south=lower ratio required?

      Let me drink this coffee and determine if that IS what I actually mean!

      Delete
  6. Considering Cisco is probably the one networking vendor that I can no longer keep pace with, I found it a very interesting insight, as opposed to my usual 'more shiney new boxes from you know who',however given this great perspective it still looks like an oversight or at least a missed opportunity, coupled with the rhetoric that these boxes have been in development for the past five years!

    ReplyDelete
  7. Does the 6004 fix the brain damage that was L3 in the Nexus 5k series switches? Basically can we do hitless software upgrades with L3 enabled and are all the other restrictions are L3 removed?

    ReplyDelete
  8. "Here’s the math: With 1:3 oversubscription ratio, each FEX has two 40GE uplinks to a pair of 6004 switches, resulting in a maximum of 48 x 2248PQ-10G, each with 48 10GE ports. "

    Hello Ivan and thanks for your very interesting post. Cisco tell me that the number a FEX will be still limited on the 6004. For reminder, on the Nexus 5k it is limited to 24 FEX. On the 6004, they talk about 32, but it isn't sure. Are you sure about the maximum number of FEX wich can be connected to the NExus 6004 (48) ?

    ReplyDelete
    Replies
    1. Let me quote the very next sentence: "Might be that the actual number is lower due to FEX scalability limitations – it will be a while before NX-OS Verified Scalability Guide will be published for Nexus 6000." ;))

      Actually, the number of ports given in a Cisco Live presentation indicates 24 or 32 might be the right answer.

      Delete
    2. Sorry, i read to fast :-)

      Thanks for this quick feedback.

      Delete
  9. Hi Ivan,

    May I ask a question about a single 40G link vs 4 Х 10G LAG.

    Let's suppose we have two datacenters (DC) in a campus, connected via two independent optic fibers. Each DC has its own Nexus 6001. Suppose that we want to connect them via those two fibers and don't want to spend more than a single 40G port on each Nexus. (Of course we want to make this connection highly available - otherwise, why bother running two redundant fibres between the sites.)

    The easy way to accomplish this task is, obviously, the following: configure the 40G port as 4 X 10G ports, combine them in a LAG, and use fibre 1 for two 10G links of the LAG and fibre 2 for the two remaining 10G links.

    Now the interesting point: performance-wise, a 40G single link mode is said to be better than 4 X 10G mode. Does it have any kind of internal redundancy? Can we split it via the breakout cable and route via two different fibres to the likewise 40G port on a remote 6001? If yes, then will this link continue working if one of the two fibres gets broken?

    ReplyDelete
    Replies
    1. 40GE does not have internal redundancy. If you lose one 10GE component, the whole link is down.

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.