HP has recently commissioned an IRF network test that came to absolutely astonishing conclusions: vMotion runs almost twice as fast across two links bundled in a port channel than across a single link (with the other one being blocked by STP). The test report contains one other gem, this one a result of incredible creativity of HP marketing:
For disaster recovery, switches within an IRF domain can be deployed across multiple data centers. According to HP, a single IRF domain can link switches up to 70 kilometers (43.5 miles) apart.
You know my opinions about stretched cluster ... and the more down-to-earth part of HP Networking (the people writing the documentation) agrees with me.
Please note: this post is not a critique of IRF fabric technology or its implementation, just of a particularly "creative" use case.
Let’s assume someone is actually brave enough to deploy a network using the design shown in the following figure with switches in two data centers merged into an IRF fabric (according to my Twitter friends this design is occasionally promoted by some HP-certified instructors):
The IRF documentation for the A7500 switches (published in August 2011) contains the following facts about IRF partitions (split IRF fabric) and Multi-Active Detection (MAD) collisions (more commonly known as split brain problems):
The partitioned IRF fabrics operate with the same IP address and cause routing and forwarding problems on the network.
No surprise there, we always knew that split subnets cause interesting side effects, but it’s nice to see it acknowledged.
It's interesting to note, though, that pure L2 solution might actually work ... but the split subnets would eventually raise their ugly heads in adjacent L3 devices.
During an IRF merge, the switches of the IRF fabric that fails the master election must reboot to re-join the IRF fabric that wins the election.
Hold on – I lose the inter-DC link(s), reestablish them, and then half of the switches reboot. Not a good idea.
Let’s assume the above design is “extended” with another bright idea – to detect split brain scenarios, the two switches run BFD over an alternate path (could be the Internet) to detect split brain events. According to the manual:
An IRF link failure causes an IRF fabric to divide into two IRF fabrics and multi-active collision occurs. When the system detects the collision, it holds a role election between the two collided IRF fabrics. The IRF fabric whose master’s member ID is smaller prevails and operates normally. The state of the other IRF fabric transitions to the recovery state and temporarily cannot forward data packets.
Isn’t that great – not only have you lost the inter-DC link, you’ve lost one of the core switches as well.
Summary: As always, just because you can doesn’t mean you should ... and remember to be wary when consultants and marketing people peddle ideas that seem too good to be true.
What are the alternatives?
As I’ve explained in the Data Center Interconnects webinar (available as recording or part of the yearly subscription or Data Center Trilogy), there are at least two sensible alternatives if you really want to implement layer-2 DCI and have multiple parallel layer-1 links (otherwise IRF wouldn’t work either)
Bundle multiple links in a port channel between two switches. If you’re not concerned about device redundancy (remember: you can merge no more than two high-end switches in an IRF fabric), use port channel between the two DCI switches.
Use IRF (or any other MLAG solution) within the data center and establish a port channel between two IRF (or VSS or vPC) clusters. This design results in full redundancy without unexpected reloads or other interesting side effects (apart from the facts that Earth curvature didn't go away, Earth still orbits the Sun and not vice versa, and split subnets still don’t work).
... and don't forget!
Should you wish to discuss the data center fabrics in person, don’t forget that I’ll be @ EuroNOG in a few weeks (arriving on Wednesday to participate in the second day of PLNOG) and probably @ Net Field Day 2 in late October.