How Line-rate Is Line-rate?

During yesterday’s Data Center Fabrics Update presentation, one of the attendees sent me this question while I was talking about the Arista 7300 series switches:

Is the 7300 really non-blocking at all packet sizes? With only 2 x Trident-2 per line card it can't support non-blocking for small packets based on Trident-2 architecture.

It was an obvious example of vendor bickering, so I ignored the question during the presentation, but it still intrigued me, so I decided to do some more research.

What exactly is non-blocking?

I struggled to find a rigorous definition of non-blocking architecture that would apply to networking devices (the original definition came from circuit switching and is obviously irrelevant in our context). Weird, considering every vendor claims to have non-blocking architecture ;)

A 15 year old presentation was the best I could come up with (additional links are obviously most welcome – please write a comment). It defines non-blocking as:

A switch is non-blocking if all output-contention free switching patterns are non-blocking.

In other words, packets sent from port A to port B are never hindered by traffic sent from port C to port D.

Trouble in Trident land?

The comment made by the attendee could indicate that Trident II chipset (BCM56850 series) cannot perform linerate forwarding at low packet sizes.

Architectures with a single lookup pipeline would become blocking when the load exceeds their packet-per-second rating; the proof is left as an exercise for the reader.

I couldn’t find any packet forwarding performance figures on Broadcom’s web site (if your GoogleFoo is better than mine, please share the link), the most technical document I found was a marketing blabber claiming linerate performance.

As it’s impossible to get anything out of Broadcom without signing an NDA with your blood, the gentleman making the comment probably violated the NDA his employer signed with Broadcom. Broadcom’s behavior is also an excellent breeding ground for vendor FUD. Great job, everyone! One has to love this industry.

Update 2014-05-29 07:00Z: The small packet forwarding limitations of Trident 2 architecture were publicly documented @ Cisco Live US 2014, see BRKARC-2222 session for more details; the attendee pointing them out was thus not disclosing non-public information as I incorrectly assumed.

Fortunately, Arista believes in sharing their performance figures (although I doubt anyone actually measured forwarding performance of thousands of 10GE ports since Juniper did their QFabric test) and hardware architecture.

The specifications for their 7300 series switches are pretty clear: a switch with four linecards (512 10GE ports) has 10Tbps switching capacity and can forward 7.5 billion packets per second. The minimum packet size at which they can do linerate forwarding is thus ~160 bytes (around ~150 bytes of L2 payload due to FCS and inter-frame gap).

The attendee making the comment was thus technically correct: Arista 7300-series switches cannot perform linerate forwarding of 40-byte TCP SYN packets.

Is this relevant?

You might have an environment in which thousands of servers have nothing better to do than saturate 10GE uplinks sending 64-byte VoIP packets or test each other’s readiness by sending continuous streams of TCP SYN/RST packets.

The only environment I’m aware of that comes close to that are the test labs. If you have a real-life use case where something generates hundreds of 10GE streams full of packets with average packet size smaller than 150 bytes, I would love to hear from you.

I recently spoke with someone who told me their caching servers (a typical example of an environment with small packet sizes) cannot saturate 10GE uplinks due to bottlenecks in Linux TCP stack.

Summary

It’s nice to know the actual limitations of each platform you’re considering. If you’re dealing with unusual workload, make sure to check PPS as well as bandwidth figures (and anything else you might find relevant – for example multicast forwarding performance or ARP table sizes).

Would this “discovery” stop me from recommending Arista 7300-series switches in average data center environments? Of course not.

Warning: rant ahead

… which brings me to the ranty part of this blog post. I understand the bitterness of a hot-shot vendor SE who just lost a big deal to a competitor, but please do yourself a favor and stop parroting the “facts” thrown at you by your competitive analysis team, particularly when those facts are totally irrelevant to real-life use cases. Such a behavior is (choose one):

A) Childish

B) Counterproductive

C) Irrrelevant

D) Irritating

E) All-of-the-above

Also, please stop pretending you’re a concerned citizen and disclose your affiliation and job position. I’m all for discussing technical details as long as we all understand individual perspectives and potential biases. On the other hand, it’s pretty easy to spot a vendor rep bashing a competitor from a mile away, and you’ll gain nothing by harassing everyone around you with your version of the truth.

Want to hear more?

I just published the raw recordings of yesterday’s session. If you bought Data Center Fabrics webinar in the past, or have a webinar subscription, you can already watch them (and it’s pretty easy to buy a recording or the subscription).

You’ll also get immediate access to the recording if you register for the second part of the update session (I had to split the update in two parts due to large number of new hardware platforms and software features).

6 comments:

  1. "You might have an environment in which thousands of servers have nothing better to do than saturate 10GE uplinks sending 64-byte VoIP packets or test each other’s readiness"

    Actually.... I'm more concerned about how the switches and routers stand up to DDoS attacks consisting of large numbers of intentionally small UDP and TCP packets (High volume "UDP Storm" and "SYN flood"); and make sure that the LAN itself isn't what falls over (at least until bandwidth usage reaches the bandwidth capacity of the links), especially 10GE switches shared with SAN infrastructure.

    I don't believe I have any complaint against the Arista 7300 specifically; I don't have access to any to test.

    ReplyDelete
  2. From "Principles and Practices of Interconnection Networks" by: William James Dally and Brian Patrick Towles pg. 112 (please read all the way through):

    "..We will call a packet-switched network that meets these criteria non-interfering.
    Such a network is able to handle arbitrary packet traffic with a guaranteed bound
    on packet delay. The traffic neither exceeds the bandwidth capacity of any network
    channel, nor does it result in coupled resource allocation between flows.

    For almost all applications today, when people say they want a non-blocking
    network, what they really require is a non-interfering network, which can usually
    be realized with considerably less expense. For the sake of history, however, and for
    those cases in which true non-blocking is needed to support circuit switching, we
    give a brief survey of non-blocking networks in the remainder of this chapter."

    ReplyDelete
  3. Hi Ivan,

    Here's the official Arista response from Douglas Gourlay - Vice President Systems Engineering:

    "The 7300 Series was designed to provide wirespeed bandwidth for the most common workloads in the data center, the cloud, and on the Internet - where according to Cisco Systems research the average packet size is between 500-bytes and 600-bytes nowadays, confirmed by a recent post from Greg Ferro here. While there is always some amount of 64-byte frames based on ACKs and SYNs and such during session setup they do not appear as 100% of the traffic in any real world operating environment.

    "If a customer does need 64-byte frame forwarding at wirespeed, on all interfaces, perfectly meshed, 100% of the time we do offer our 7500 Series which also has larger buffer pools to handle the periods of incast based congestion that are highly likely in that type of contrived test workload.

    "As Ivan and Brad have both identified - the scenario where 64-byte wirespeed frame forwarding on all interfaces concurrently comes up is in test labs and benchmarking suites. Arista felt that because we already offer a switching family (Arista 7500) that can support this lab benchmark we needed to optimize on supporting our customers requirements for lower power, increased efficiency, and increased port density rather than chasing a benchmark that is useless in the real world."

    Sincerely,

    Brad Reese

    ReplyDelete
  4. Hi Ivan,

    Update 6/18/2014 - Cisco Nexus 9000 Series NX-OS Release Notes:

    "In OSM, the NFE cannot run at line rate for packet sizes of less than 200 bytes."

    Sincerely,

    Brad Reese

    ReplyDelete
    Replies
    1. At least it's documented and configurable ;) Thanks for pointing it out!

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.