I won’t spend any time on the “perfectly fine” part (Greg Ferro had a lot to say about that in the early Packet Pushers podcasts), but focus on the fundamental difference between the two: the use case.
Typical Metro Ethernet use case
Engineers who know what they’re doing connect individual sites to Metro Ethernet services with layer-3 devices (some others will eventually figure it out after a meltdown or two).
It doesn’t matter whether you call the site edge devices routers or switches, they perform several critical functions:
- They split the inside (your site) and the outside (service provider transport network) into two separate L3 subnets and two failure domains;
- They run routing protocols. Other devices attached to the same Metro Ethernet service can thus figure out whether a site is reachable or not;
- They can find alternate paths (if they exist) after a link or service failure.
In principle, the routers connecting your sites to a Metro Ethernet service treat that service as one of the potential transport networks, and can use the routing protocols or BFD/CFM to figure out when the Metro Ethernet service is gone even if the local link status doesn’t change.
Worst case, if the Metro Ethernet service falls apart, and you’ve provisioned backup links, your sites can still communicate with each other. If the Metro Ethernet service experiences a severe meltdown, the hosts inside your sites will not be affected (the routers might be due to heavy CPU load induced by broadcasts received from Metro Ethernet LAN).
Summary: it’s perfectly safe to use layer-2 transport network as long as you terminate it with a layer-3 device.
Typical stretched data center subnet use case
Hosts are directly attached to stretched layer-2 subnets (VLANs) in a typical layer-2 data center interconnect design, as shown in the next diagram.
The servers (IP hosts) attached to stretched VLANs usually have no routing intelligence; all they know are two simple rules:
- If the destination IP address belongs to the same subnet, use ARP to find the MAC address of the other host, and send the IP packet to that MAC address. If the ARP request fails, the other host is unreachable.
- Otherwise, send the IP packet to the IP address of the default gateway.
The lack of routing intelligence in typical servers is not a software/OS issue. Linux and z/OS support routing daemons, and so did Windows server until it got lobotomized. However, it seems many engineers think naked singularity would materialize and gobble up their whole data center if they configured OSPF on a server.
Typical IP hosts have no means of detecting the VLAN failure or partitioning, and cannot find alternate paths. They rely on network devices providing the connectivity, and with no layer-3 intelligence in the path, there’s only so much the networking devices can do.
The layer-2 data center interconnect thus becomes the most critical part of the whole data center infrastructure – if it breaks, everything else stops working (assuming the servers or VMs in the same subnet are on both ends of the failure). Is that a good idea? Not in my book.