Every data center network has a mixture of bridging (layer-2 or MAC-based forwarding, aka switching) and routing (layer-3 or IP-based forwarding); the exact mix, the size of L2 domains, and the position of L2/L3 boundary depend heavily on the workload ... and I would really like to understand what works for you in your data center, so please leave as much feedback as you can in the comments.
Considering the L2/L3 boundary in a data center network, you can find two extremes:
Data centers that run a single scale-out application, be it big data calculations (Hadoop clusters, for example) or web portals. IP addressing and subnets usually don’t matter (unless you use weird tricks that rely on L2), and the L3 forwarding usually starts at the top-of-rack switch.
Cloud data centers that use MAC-over-IP encapsulations to implement virtual network segments are also in this category.
Fully virtualized data centers need large L2 domain to support VM mobility; VLANs extend across aggregation/core switches (or spine switches if you use Clos fabrics terminology).
Data centers with large L2 domains usually use L3 forwarding only in the core switches; these switches have SVI (or RVI, depending on which vendor you’re talking with) VLAN interfaces, and IP addresses configured on VLANs. Most vendors have technologies that combine multi-chassis link aggregation with active-active forwarding (all members of an MLAG cluster share IP address and perform L3 forwarding between connected VLANs).
L2 forwarding in ToR switches and L3 forwarding in the data center network core clearly results in suboptimal traffic flow. If you have servers (or VMs) in different VLANs connected to the same ToR switch, the traffic between them has to be hairpinned through the closest L3 switch. However, the question I’m struggling with is: “Does it really matter?”
In most cases, you’d use VLANs in a virtualized environment to implement different security zones in an application stack, and deploy firewalls or load balancers between them. If you do all the L3 forwarding with load balancers or firewalls, you don’t need L3 switches, and thus the problem of suboptimal L3 forwarding becomes moot.
Feedback highly appreciated!
As I wrote in the introductory paragraph, I would really like to understand what you’re doing (particularly in environments using VMs with large VLANs). Please share as much as you can in the comments, including:
- Do you use L3 forwarding on the data center switches;
- Where do you deploy L3 switching (ToR/leaf, aggregation, core/spine);
- Why do you use L3 forwarding together with (or instead of) load balancers and/or firewalls;
- Do you see suboptimal L3 forwarding as a serious problem?