Roman Pomazanov documented his thoughts on the beauties of large layer-2 domains in a LinkedIn article and allowed me to repost it on ipSpace.net blog to ensure it doesn’t disappear
First of all: “L2 is a single failure domain”, a problem at one point can easily spread to the entire datacenter.
Brandon Hitzel published a detailed document describing various Internet WAN edge designs. Definitely worth reading and bookmarking.
One of my subscribers found an unusual BGP specimen in the wild:
- It was a small site with two core switches and a WAN edge router
- The site had VPN concentrators running in virtual machines
- The WAN edge router was running BGP across WAN IPsec tunnels
- The VPN concentrators were running BGP with core switches.
So far so good, and kudos to whoever realized BGP is the only sane protocol to run between virtual machines and network core. However, the routing in the network core was implemented with EBGP sessions between the three core devices, and my subscriber thought the correct way to do it would be to use IBGP and OSPF.
I must be a good prompt engineer – every time I ask ChatGPT something really simple it spews out nonsense. This time I asked it to build a small network with four routers:
I have a network with four Cisco routers (A,B,C,D). They are connected as follow: A-B, B-C, A-D, D-C. Each router has a loopback interface. Create router configurations that will result in A being able to reach loopback interfaces of all other routers.
Here’s what I got back1:
Here’s an example configuration for the four routers that should allow Router A to reach the loopback interfaces of all other routers:
The last time I spent days poring over vendor datasheets collecting information for the overview part of Data Center Fabrics webinar a lot of 1RU data center leaf switches came in two form factors:
- 48 low-speed server-facing ports and 4 high-speed uplinks
- 32 high-speed ports that you could break out into four times as many low-speed ports (but not all of them)
I expected the ratios to stay the same when the industry moved from 10/40 GE to 25/100 GE switches. I was wrong – most 1RU leaf data center switches based on recent Broadcom silicon (Trident-3 or Trident-4) have between eight and twelve uplinks.
A networking engineer attending the Building Next-Generation Data Center online course asked this question:
What is the best practice to connect DC fabric to outside world assuming there are 2 spine switches in the fabric and EVPN VXLAN is used as overlay? Is it a good idea to introduce edge (border) switches, or it is better to connect outside world directly to the spine?
As always, the answer is “it depends,” this time based on:
Even though IPv6 could buy its own beer (in US, let alone rest of the world), networking engineers still struggle with its deployment – one of the first questions I got in the ipSpace.net Design Clinic was:
We have been tasked to start IPv6 planning. Can we discuss (for enterprises like us who all of the sudden want IPv6) which design paths to take?
I did my best to answer this question and describe the basics of creating an IPv6 addressing plan. For even more details, watch the IPv6 webinars (most of them at least a few years old, but nothing changed in the IPv6 world in the meantime apart from the SRv6 madness).
I’m always envious of how easy networking challenges seem when you’re solving them in PowerPoint, for example, when an innovation specialist explains how scalability works in leaf-and-spine fabrics in a LinkedIn comment:
One of the main benefits of a CLOS folded spine topology is the scale out spine where you can scale out the number of spine nodes increasing your leaf-spine n-way ECMP as well as minimizing the blast radius with the more spine nodes the more redundancy and resiliency.
Isn’t that wonderful? If you need more bandwidth, sprinkle the magic spine powder on your fabric, add water, and voila! Problem solved. Also, it looks like adding spine switches reduces the blast radius. Who would have known?
Two weeks ago I explained why you might want to run IBGP between CE-routers on a multihomed site. One of the blog readers didn’t like my ideas:
In such a small deployment I assume that both ISPs offer transit, so that both CEs would get a default route from their upstream.
In this case I would not iBGP the CEs together but have HSRP running on the two CEs and track the uplink (interface and/of BGP session) to determine the active gateway.
Let’s see what could possibly go wrong with that design.
One of my readers sent me a question along these lines:
How do we determine the number of spines needed in a leaf-and-spine fabric? It’s easy to calculate the number of leaf nodes from the required number of server ports, and two spines give you the redundancy. Does it make sense to have more spines if two are good enough from the capacity perspective?
There are at least two factors to consider:
In the Designing Active-Active and Disaster Recovery Data Centers I tried to give networking engineers a high-level overview of challenges one might face when designing a highly-available application stack, and used that information to show why the common “solutions” like stretched VLANs make little sense if one cares about application availability (as opposed to auditor report). Some (customer) engineers like that approach; here’s the feedback I received not long ago:
As ever, Ivan cuts to the quick and provides not just the logical basis for a given design, but a wealth of advice, pointers, gotchas stemming from his extensive real-world experience. What is most valuable to me are those “gotchas” and what NOT to do, again, logically explained. You won’t find better material IMHO.
Please note that I’m talking about generic multi-site scenarios. From the high-level connectivity and application architecture perspective there’s not much difference between a multi-site on-premises (or collocation) deployment, a hybrid cloud, or a multicloud deployment.
One of my readers sent me a question along these lines:
Do I have to have an IBGP session between Customer Edge (CE) routers in a multihomed site if they run EBGP with the upstream provider(s)?
Let’s start with a simple diagram and a refactoring of the question:
I decided to stop caring about IPv6 when the protocol became old enough to buy its own beer (now even in US), but its second-system effects keep coming back to haunt us. Here’s a question I got for the February 2023 ipSpace.net Design Clinic:
How can we do IPv6 networking in a small/medium enterprise if we’re using multiple ISPs and don’t have our own IPv6 Provider Independent IPv6 allocation. I’ve brainstormed this with people far more knowledgeable than me on IPv6, and listened to IPv6 Buzz episodes discussing it, but I still can’t figure it out.
The ipSpace.net Design Clinic has been running for a bit over than a year. We covered tons of interesting technologies and design challenges, resulting in over 13 hours of content (so far), including several BGP-related discussions:
- BGP route servers
- Redundant BGP-Based Internet Access
- Secure BGP Configuration on Customer Routers
- Enterprise WAN Routing Design
An attendee in the Building Next-Generation Data Center online course sent me an interesting dilemma:
Some customers don’t like EVPN because of complexity (it is required knowledge BGP, symmetric/asymmetric IRB, ARP suppression, VRF, RT/RD, etc). They agree, that EVPN gives more stability and broadcast traffic optimization, but still, it will not save DC from broadcast storms, because protections methods are the same for both solutions (minimize L2 segments, storm-control).
We’ll deal with the unnecessary EVPN-induced complexity some other time, today let’s start with a few intro-level details.