Category: Fabric
Controller-Based Packet Forwarding in OpenFlow Networks
One of the attendees of the ProgrammableFlow webinar sent me an interesting observation:
Though there is separate control plane and separate data plane, it appears that there is crossover from one to the other. Consider the scenario when flow tables are not programmed and so the packets will be punted by the ingress switch to PFC. The PFC will then forward these packets to the egress switch so that the initial packets are not dropped. So in some sense: we are seeing packet traversing the boundaries of typical data-plane and control-plane and vice-versa.
He’s absolutely right, and if the above description reminds you of fast and process switching you’re spot on. There really is nothing new under the sun.
NEC ProgrammableFlow Scalability Features
Once you get rid of spanning tree and associated kludges (not too hard in OpenFlow-based networks), BUM flooding becomes your biggest enemy. NEC’s engineers implemented some interesting features in the ProgrammableFlow switches and controllers: rate-limiting of unknown unicast frames, flooding control, and ARP snooping (if only they’d go for ARP proxy).
Quality of Service in ProgrammableFlow Networks
OpenFlow is not exactly known for its quality-of-service features (hint: there are none), but as I described in the ProgrammableFlow Technical Deep Dive webinar NEC implemented numerous OpenFlow extensions in their edge switches and the ProgrammableFlow controller to give you a robust set of QoS features.
Example: Multi-Stage Clos Fabrics
Smaller Clos fabrics are built with two layers of switches: leaf and spine switches. The oversubscription ratio you want to achieve dictates the number of uplinks on the leaf switch, which in turn dictates the maximum number of spine switches and thus the fabric size.
You have to use multi-stage Clos architecture if you want to build bigger fabrics; Brad Hedlund described a sample fabric with over 24.000 server-facing ports in the Clos Fabrics Explained webinar.
The Saga of Oversubscriptions
Matt Thompson provided a really good answer to the “what’s acceptable oversubscription ratio in a ToR switch” when he wrote “I’m expecting a ‘how long is a piece of string’ answer” (note: do watch the BBC video answering that one).
There’s the 3:1 rule-of-thumb recipe, with a more realistic answer being “it depends”. Now let’s see if we can go beyond that without a deep dive into scholastic waters.
Intra-Spine Links in Leaf-and-Spine Fabrics
I had an interesting conversation with Doug Hanks (@douglashanksjr) about the need for intra-spine links in leaf-and-spine fabric designs. You clearly don’t need links between spine switches when every leaf node (switch or router/firewall/load balancer) is connected to all spine switches ... but what happens when one of the leaf-to-spine links fails? Will other leaf switches know that they have to avoid the spine switch with the failed link?
Nexus 6000 and 40GE – why do I care?
Cisco launched two new data center switches on Monday: Nexus 6001, a 1RU ToR switch with the exact same port configuration as any other ToR switch on the market (48 x 10GE, 4 x 40GE usable as 16 x 10GE) and Nexus 6004, a monster spine switch with 96 40GE ports (it has the same bandwidth as Arista’s 7508 in a 4RU form factor and three times as many 40GE ports as Dell Force10 Z9000).
Apart from slightly higher port density, Nexus 6001 looks almost like Nexus 5548 (which has 48 10GE ports) or Nexus 3064X. So where’s the beef?
Link Aggregation with Stackable Data Center Top-of-Rack Switches
Tomas Kubica made an interesting comment to my Stackable Data Center Switches blog post: “Suppose all your servers have 4x 10G port and you bundle them to LACP NIC team [...] With this stacking link is not going to be used for your inter-server traffic if all servers have active connections to all nodes of your ToR stack.” While he’s technically correct, the idea of having four 10GE ports on each server just to cater to the whims of stackable switches is somewhat hard to sell.
Who the **** needs 16 uplinks? Welcome to 10GE world!
Will made an interesting comment to my Stackable Data Center Switches article: “Who the heck has 16 uplinks?” Most of us do in the brave new 10GE world.
Large Leaf-and-Spine Fabrics with Dell Force10 Switches Using 10GE Uplinks
The second scenario Brad Hedlund described in the Clos Fabrics Explained webinar is a large leaf-and-spine fabric using 10GE uplinks and QSFP+ breakout cables between leaf and spine switches (thus increasing the number of spine switches to 16).