One of the responses I got on my “What is Layer-2” post was
Ivan, are you saying to use L3 switches everywhere with /31 on the switch ports and the servers/workstation?
While that solution would work (and I know a few people who are using it with reasonable success), it’s nothing more than creative use of existing routing paradigms; we need something better.
Update 2015-04-22 14:30Z - Added a link to Cumulus Linux Redistribute Neighbor feature.
Why are we talking about this?
In case you stumbled upon this blog post by accident, I’d strongly recommend you read a few other blog posts first to get the context:
- What is layer-2 and why do we need it?
- Why is IPv6 layer-2 security so complex?
- Compromised security zone = game over
Ready? Let’s go.
Where do we have a problem?
Obviously we only experience problems described in the above blog posts if we have hosts that should not trust each other (individual users, servers from different applications) in the same security domain (= VLAN).
If you’re operating a mobile or PPPoX network, or if your data center uses proper segmentation with each application being an independent tenant, you should stop reading. If you’re not so lucky, let’s move forward.
In PPPoX and mobile networks every user (CPE device or phone) appears on a virtual dial-up interface and gets a /64 IPv6 prefix or an IPv4 host route. In any case, the point-to-point layer-2 link terminates on BRAS/GGSN.
What are we doing today?
Many environments (including campus, wireless and data center networks) use large layer-2 domains (VLANs) to support random IP address assignment or IP address mobility (from this point onwards IP means IPv4 and/or IPv6).
Switches in these networks perform layer-2 forwarding of packets sent within an IP subnet even though they might be capable of performing layer-3 forwarding.
The situation is ridiculous in extreme in environments with anycast layer-3 gateways (example: Arista VARP, Cisco ACI) – even though every first-hop switch has full layer-3 functionality and is even configured to perform layer-3 switching between subnets, it still forwards packets based on layer-2 (MAC) address within a subnet.
For further information on layer-3 forwarding in data centers and anycast layer-3 gateways, read these blog posts:
- Does optimal L3 forwarding matter in data centers?
- VRRP, anycast, fabrics and optimal forwarding
- Optimal L3 forwarding with VARP and active/active VRRP
- Arista EOS Virtual ARP (VARP) behind the scenes
Now imagine a slight change in IP forwarding behavior:
- Every first-hop switch tracks attached IP addresses;
- A switch creates a host route (/32 in IPv4 world, /128 in IPv6 world) for every directly-attached IP host;
- The host routes are redistributed into routing protocol, allowing every other layer-3 switch in the network to perform layer-3 forwarding toward any host regardless of host’s location in the network.
Does this sound like Mobile ARP from 20 years ago? Sure it does – see also RFC 1925 section 2.11.
Will it work?
It already does. Microsoft Hyper-V Network Virtualization, Juniper Contrail and Nuage VSP use this forwarding behavior in virtual networks. Cisco’s Dynamic Fabric Automation (DFA) uses the same forwarding behavior in physical data center fabric, and Cisco ACI might be doing something similar. Not surprisingly, most of these solutions use BGP as the routing protocol.
Finally, if you're using Cumulus Linux, try out the Redistribute Neighbor experimental feature, which redistributes ARP cache into a routing protocol.
Interested in dirty details? You’ll find in-depth explanation in Overlay Virtual Networking webinar, which also includes step-by-step packet traces. An overview of DFA control plane and packet forwarding behavior is included in the Data Center Fabrics webinar.