Cisco Nexus 3548: A Victory for Custom ASICs?

Thursday, September 20, 2012 08:48 +0200

Cisco Nexus 3548: A Victory for Custom ASICs?

Autumn must be a perfect time for data center product launches: last week Brocade launched its core VDX switch and yesterday Arista and Cisco launched their new low-latency switches (yeah, the simultaneous launch must have been pure coincidence).

I had the opportunity to listen to Cisco’s and Arista’s product briefings, continuously experiencing a weird feeling of déjà vu. The two switches look like twin brothers … but there are some significant differences between the two:

Cisco’s Nexus 3548 is narrowly focused on high-performance trading, Arista’s traditional market. Arista’s 7150 is a more generic top-of-rack switch with features targeting private clouds (example: VXLAN termination);
Arista’s switches use merchant silicon, Nexus 3548 runs on new generation of Cisco’s ASICs (read Colin McNamara’s blog post for more details);
Both switches have comparable table sizes: 64K MAC addresses and 64K adjacent hosts (ARP/ND table). Arista’s switch has significantly bigger IP forwarding tables (84K IP routes versus 16K IP routes in Nexus 3548);
7150S-64 has 64 10GE ports, Nexus 3548 has 48 10GE ports;
Surprisingly, the typical power draw of Nexus 3548 is almost identical to Arista’s 7152 (52-port switch);
And finally (my favorite): only one of the two supports IPv6.

Focusing on additional software and hardware features, it’s obvious Cisco was reading Arista’s HPT playbook: both switches can combine four 10GE ports into a single 40GE port (not a LAG), have microburst management, APIs, precision timing with PTP, timestamps in SPAN/mirrored packets, and hardware NAT with ridiculously low latencies.

The latency game

The true difference between the two switches is the packet forwarding latency.

Arista was traditionally a market leader in this space and its new switch raised (actually lowered) that bar significantly to ~380 nanoseconds … but only for a few moments – Nexus 3548 has 250 nanosecond cut-through latency, which can be further reduced to 190 nanosecond in warp mode (yes, you do need an additional software license to enable the warp drive). The trick to reduced latency is reduced MAC table size: 8K addresses in warp mode.

Nexus 3548 also has mindboggling hub-like warp SPAN performance: mirroring packets from input to a set of output ports takes 50 nanoseconds (or ~60 bytes @ 10 Gbps). Obviously this trick only works with cut-through switching (which can’t be done from 10GE to 40GE ports or vice versa) on idle output ports.

Do we care?

A bit of perspective: Speed of light is finite – one meter equals 3 nanoseconds. Signal propagation in fiber or copper is a bit slower; it takes approximately 5 nanoseconds for a meter. 10-meter cables thus introduce ~100 ns latency (50 ns on each leg) … and then there’s latency introduced by SFP+ transceivers.

I am positive there are people out there that think they need this kind of performance and are willing to pay for it. I am also positive almost all of us (particularly those that still have to work with data residing on disk drives) stopped caring a long while ago when the forwarding latencies dropped to a few microseconds.

Recent posts in the same categories

switching

data center

23 comments:

Anonymous 20 September 2012 15:46

Hi Ivan,

Both switches also support NAT...
My feeling is that the ASIC is that this is more a customized ASIC based on a merchant ASIC than a custom homemade ASIC.

Fabian

Anonymous 20 September 2012 16:19

these latency games are all about automated trading. this is probably the most important application for these boxes.

nld 20 September 2012 16:23

Hi Ivan,

Last time I checked, the latency crown was held by Gnodal, they advertise sub-150 ns cut-through switching latency.
http://www.gnodal.com/Products/GS-Series/

Not really sure if 50-100 ns makes that much difference, but HFT people are really shaving off microseconds here and there. Actual production network (server-server) is usually Infiniband with RDMA and other stuff.
Some reading on low-latency infra, if you're interested (it's PPT, unfortunately) - http://www.informatix-sol.com/docs/LowLatency101.pdf
A little dated, perhaps, but mostly still relevant.

vPackets 20 September 2012 16:57

Hi Ivan.

You say in your blogpost that : "The trick to reduced latency is reduced MAC table size: 8K addresses in warp mode."

What does reducing the MAC table really do on the silicon ? I mean the buffer for indexing the MAC addresses will be smaller but where the extra space is going ?

Short summary: Can you deep dive in this please and enlighten me ? :)

Many thanks !

Nic

Replies

Ivan Pepelnjak 20 September 2012 17:55

Can't deep-dive, that's all the information I got.

MAC tables are usually organized as either TCAMs or hash tables. In both cases, accessing larger table might take an extra (hardware) step, resulting in higher latency. Just guessing.

Anonymous 21 September 2012 07:49

Actually, imagine a packet shows up on all interfaces at the same time, since you can only look up one packet at a time in the mac table what 'warp' drive does is split the mac table into 8 tables where all the entries are replicated. This way you can lookup 8 packets in parallel. Meaning that for a 48 port switch each replicated table will serve 6 input ports so a given packet at worst case will have to wait for 5 lookup to get it's turn.

Surya ARBY 26 September 2012 11:42

Also there are collisions in hash tables (in fact it consists in hash tables where entries are chained lists); then the algorithm is just a sequential lookup in the list bound to a specific entry.

vikas S 20 September 2012 19:08

Which one of the two supports IPv6?

Replies

Ivan Pepelnjak 20 September 2012 20:00

Is it too hard to check the vendor datasheets? I provided links to both datasheets in the article.

Anonymous 20 September 2012 21:54

Neither of the datasheets mention IPv6..

igp2bgp 20 September 2012 22:49

http://www.aristanetworks.com/en/products/7150-series/7150-datasheet
key feature tab,

Winny 23 September 2012 15:38

IPv6 should be no prob for 71xx

Brad Reese 20 September 2012 23:11

The Arista 7150S switch family supports IPv6:

http://www.bradreese.com/blog/9-20-2012.htm

Sincerely,

Brad Reese

Replies

Anonymous 21 September 2012 10:30

Supported in a future software release

Brad Reese 21 September 2012 18:50

Yes, you're correct:

Arista 7150S Data Sheet (page 3)

Layer 3 Features

21K IPv6 Routes*
2K IPv6 Multicast Routes*

(Page 4)

* Supported in a future software release

http://www.aristanetworks.com/media/system/pdf/Datasheets/7150S_Datasheet.pdf

Sincerely,

Brad Reese

Unknown 29 September 2012 15:06

Latency is key to HFT, BUT consistent jitter AND low latency will remain the winner for HFT.

Also Arista’s older 7124SX and the newer 7150S use the (Intel) Fulcrum ASIC – the key in this is both consistent latency and Jitter across all packet sizes and the same for layer 2 or 3, or any other feature. Would like to see the Cisco tested in that manner.

Blake Willis 11 October 2012 10:33

BTW, 7150S-64 has "48x1/10GbE and 4 x 10/40GbE", not 64x 10GE. You might want to modify that.

I was wondering how they managed to cram 64 SFP cages into a 1u chassis :-)

Replies

Ivan Pepelnjak 11 October 2012 19:35

It has 48 10GE and 4x40GE ports that you can split into four 10GE ports with a breakout cable. Their data sheet claims a total of 64 ports.

Blake Willis 18 October 2012 02:50

Thanks for pointing that out. They could certainly be clearer about that on the product's page...

Carlos Mendioroz 28 September 2014 14:22

Nx3548: Warp mode to further reduce latency to 190 ns for small-to-midsize layer 2 and 3 deployments. That always clicks a two sided argument in my head: is L3 slower than L2 in any account ? Even if doing hardware switching, you need to read more to make the call, and you have to rewrite/recalc. On the other hand, the difference might amount to a couple of meters of fiber ? :)

Replies

Ivan Pepelnjak 29 September 2014 01:07

L3 could be slower in theory, I'm not sure it's any slower in practice (and you'd have to be in cut-through switching anyway to notice the difference).

Carlos Mendioroz 29 September 2014 01:21

Gee, I've been saying that since the '90s when some cisco course (MPLS) said that routing is slower... at the same time showing how CEF does L3 switching. Vmware did not like/support vMotion because of latency too. Myths ? :)

Ivan Pepelnjak 29 September 2014 01:29

In those days MPLS was marginally faster, as it took a single linear table lookup versus multiple N-ary tree/trie lookups for IP.

As for vMotion over L3, see http://blog.ipspace.net/2014/09/vmotion-enhancements-in-vsphere.html

Add comment