Are you ever asked to use a layer-2 Data Center Interconnect to implement distributed active-active firewalls, supposedly solving all the L3 issues and asymmetrical-traffic-flow-over-stateful-firewalls problems? Don’t be surprised; for example, a year ago I was stupid enough (or maybe just blinded by the L2 glitter) to draw the following diagram illustrating a sample use of VPLS services:
The solution looks ideal: both WAN routers would advertise the same IP prefix to the outside world, attract the customer traffic and pass the traffic through the nearest firewall. The inside routers would take care of proper traffic distribution (inbound traffic might need to traverse the DCI link) and the return traffic would follow the shortest path toward the WAN (or Internet) cloud. The active-active firewalls would exchange flow information, solving the asymmetrical flow problems.
Now ask yourself: what happens if the DCI link fails? Some of the inbound traffic will arrive to the wrong edge router and get dropped, and the firewalls will go into split-brain mode. You’ll clearly experience problems in both data centers.
Usually we use pairs of devices (routers and firewalls in this particular example) in redundant configurations to increase the overall system availability. I am no expert in high availability calculations, but one of the hidden assumptions in designs where devices have to exchange state information (active-active firewalls, for example) is that the non-redundant component (the link between the devices) has to be as reliable as the devices themselves.
In a stretched subnet design the weakest link of the whole system is the data center interconnect; in most cases, stretched subnets would decrease the overall availability of the system.
A reliable layer-3 solution is not much easier to design. A while ago I was involved in a redesign of a global network. The customer had very knowledgeable networking team (it was a total pleasure working with them) and we tried really hard to find a redundant data center design that would allow them to advertise a single L3 prefix from both data centers. We even reached the point where we had a working design that would survive all sorts of failures, but it got too complex for the customer (and they were absolutely right to reject it and fall back to simpler options).
Unless you believe in the miracles of TCP-based anycasting, it seems the best option you have to implement distributed data centers is still the time-proven design used by web content providers with good track record like Google: DNS-based load balancing between data centers in combination with data-center-specific+summary-as-a-backup prefix advertising into BGP.
You have to move the firewalls deeper into the data center if you want to avoid session loss following an Internet link failure.
The Data Center Interconnects webinar (register here) describes numerous design and implementation aspects of L2 and L3 data center interconnects. It covers load balancing and external routing for L3 interconnects and numerous L2 DCI interconnect technologies, including VPLS, A-VPLS, OTV, TRILL, BGP MPLS based MAC VPN and EtherIP between load balancers (F5).