To Centralize or not to Centralize, That’s the Question
One of the attendees of the Building Next-Generation Data Center online course solved the build small data center fabric challenge with Virtual Chassis Fabric (VCF). I pointed out that I would prefer not to use VCF as it uses centralized control plane and is thus a single failure domain.
Here are his arguments for using VCF:
As for the architecture, VCF is a simple design for small to medium DC. It is a centralized architecture but has the L2 and L3 simplicity to provide scalability for legacy application while also using L3. There are redundant route engines to assist in failure of master route engine. Protocols like GRES ( graceful route engine switchover), NSR / NSB, non-stop routing/bridging also assist in quick RE fail-overs while also assisting protocols in convergence times.
They are all valid arguments, but in practice I dislike centralized control/management plane architectures because they’re really hard to get right… and if you get byzantine control plane failure, you lose the whole fabric.
Also, there are occasional software upgrade challenges that you don’t get with independent boxes, and everyone who’s been in networking long enough has a scary horror story about a failed stackable switch upgrade.
An obvious alternative to VCF would be a traditional leaf-and-spine fabric with VXLAN using either EVPN control plane or statically-configured ingress BUM replication with dynamic MAC learning. More robust, less complex software, smaller blast radius… but harder to design and configure.
As always, it’s the question of explicit versus hidden complexity, and you have to choose which one is better for you. I have no problem with that - it’s just that the customers going for hidden complexity aren’t always aware of the risks they’re taking.
Further Reading
To Learn More about These Topics
Check out ipSpace.net data center webinars, in particular
- Data Center Fabric Architectures
- Designing Leaf-and-Spine Fabrics
- EVPN Technical Deep Dive
- VXLAN Deep Dive
Need even more? How Building Next-Generation Data Center online course?
Speaking about hidden complexities - article of "Leaky Abstractions" posted earlier is a great read. Thanks!