Building network automation solutions

9 module online course

Start now!

Category: design

Designing Networks: From Tricycles to Aircraft Carriers

I planned to take my summer break seriously and stop blogging until late August, but then I shouldn’t have looked at my Twitter feed (my bad), where the AI algorithms selected just the right morsel to trigger the maximum rantiness. I would strongly recommend you read the original tweet and all the responses first – it looks like it was a serious suggestion, not a trolling exercise.

read more see 1 comments

Stretched VLANs: What Problem Are You Trying to Solve?

One of ipSpace.net subscribers sent me this interesting question:

I am the network administrator of a small data center network that spans 2 buildings. The main building has a pair of L2/L3 10G core switches. The second building has a stack of access switches connected to the main building with 10G uplinks. This secondary datacenter has got some ESX hosts and NAS for remote backup and some VM for development and testing, but all the Internet connection, firewall and server are in the main building.

There is no routing in the secondary building and most of the VLANs are stretched. Do you think I must change that (bringing routing to the secondary datacenter), or keep it simple like it is now?

As always, it depends, this time on what problem are you trying to solve?

read more add comment

Questions about BGP in the Data Center (with a Whiff of SRv6)

Henk Smit left numerous questions in a comment referring to the Rethinking BGP in the Data Center presentation by Russ White:

In Russ White’s presentation, he listed a few requirements to compare BGP, IS-IS and OSPF. Prefix distribution, filtering, TE, tagging, vendor-support, autoconfig and topology visibility. The one thing I was missing was: scalability.

I noticed the same thing. We kept hearing how BGP scales better than link-state protocols (no doubt about that) and how you couldn’t possibly build a large data center fabric with a link-state protocol… and yet this aspect wasn’t even mentioned.

read more see 3 comments

Worth Reading: Running BGP in Large-Scale Data Centers

Here’s one of the major differences between Facebook and Google: one of them publishes research papers with helpful and actionable information, the other uses publications as recruitment drive full of we’re so awesome but you have to trust us – we’re not sharing the crucial details.

Recent data point: Facebook published an interesting paper describing their data center BGP design. Absolutely worth reading.

Just in case you haven’t realized: Petr Lapukhov of the RFC 7938 fame moved from Microsoft to Facebook a few years ago. Coincidence? I think not.

see 5 comments

Worth Reading: Rethinking Internet Backbone Architectures

Johan Gustawsson wrote a lengthy blog post describing Telia’s approach to next-generation Internet backbone architecture… and it’s so refreshing seeing someone bringing to life what some of us have been preaching for ages:

  • Simplify the network;
  • Stop cramming ever-more-complex services into the network;
  • Bloated major vendor NPUs implementing every magic ever envisioned are overpriced – platforms like Broadcom Jericho2 are good enough for most use cases.
  • Return from large chassis-based stupidities to network-centric high availability.

I don’t know enough about optics to have an opinion on what they did there, but it looks as good as the routing part. It would be great to hear your opinion on the topic – write a comment.

add comment

Video: Cisco SD-WAN Site Design

In the Site Design part of Cisco SD-WAN webinar, David Penaloza described capabilities you can use when designing complex sites, like extending SD-WAN transport between SD-WAN edge nodes, or implementing high availability between them. He also explained how to track an Internet-facing interface and a service beyond its next hop.

You need Free ipSpace.net Subscription to watch the video.
add comment

Worth Reading: When Stretching Layer Two, Separate Your Fate

Ethan Banks wrote the best one-line description of the crazy stuff we have to deal with in his When Stretching Layer Two, Separate Your Fate blog post:

No application should be tightly coupled to an IP address. This common issue should really be solved by application architects rebuilding the app properly instead of continuing like it’s 1999 while screaming YOLO.

Not that his (or my) take on indisputable facts would change anything… At least we can still enjoy a good rant ;)

add comment

Worth Reading: Understand Your Single Points of Failure

I’ve been saying the same thing for years, but never as succinctly as Alastair Cooke did in his Understand Your Single Points of Failure (SPOF) blog post:

The problem is that each time we eliminated a SPOF, we at least doubled our cost and complexity. The additional cost and complexity are precisely why we may choose to leave a SPOF; eliminating the SPOF may be more expensive than an outage cost due to the SPOF.

Obviously that assumes that you’re able to follow business objectives and not some artificial measure like uptime. Speaking of artificial measures, you might like the discussion about taxonomy of indecision.

add comment

Worth Reading: The Insider's Guide To Evangelizing Good Design

Scott Berkun wrote another great article that’s equally applicable to the traditional notion of design (his specialty) and the network design. Read it, replace design with network design, and use its lessons. Here’s just a sample:

  • Convincing people is a social process
  • Aim for small wins, not conversions of belief systems
  • Allies matter more than ideas
  • Design maturity grows one step at a time.
add comment

Using Unequal-Cost Multipath to Cope with Leaf-and-Spine Fabric Failures

Scott submitted an interesting the comment to my Does Unequal-Cost Multipath (UCMP) Make Sense blog post:

How about even Large CLOS networks with the same interface capacity, but accounting for things to fail; fabric cards, links or nodes in disaggregated units. You can either UCMP or drain large parts of your network to get the most out of ECMP.

Before I managed to write a reply (sometimes it takes months while an idea is simmering somewhere in my subconscious) Jeff Tantsura pointed me to an excellent article by Erico Vanini that describes the types of asymmetries you might encounter in a leaf-and-spine fabric: an ideal starting point for this discussion.

read more see 5 comments

Impact of Azure Subnets on High Availability Designs

Now that you know all about regions and availability zones (AZ) and the ways AWS and Azure implement subnets, let’s get to the crux of the original question Daniel Dib sent me:

As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? You mentioned that it’s difficult to create application swimlanes that way. But does subnet matter if your VMs are in different AZs?

It’s time I explain the concepts of application swimlanes and how they apply to availability zones in public clouds.

read more add comment

Routing in Stretched VLAN Designs

One of my readers was “blessed” with the stretched VLANs requirement combined with the need for inter-VLAN routing and sub-par equipment from a vendor not exactly known for their data center switching products. Before going on, you might want to read his description of the challenge he’s facing and what I had to say about the idea of building stackable switches across multiple locations.

Of course it’s possible that my reader failed to explain the challenge in enough details to get good advice from the vendor SE, or that he had to deal with a clueless SE, or that he’s using ancient gear or that the stars just weren’t aligned… but I don’t think anyone should ever be painted into the corner he found himself in.

Here’s an overview diagram of what my reader was facing. The core switches in each location work as a single device (virtual chassis), and there’s MLAG between core and edge switches. The early 2000s just called and they were proud of the design (but to be honest, sometimes one has to work with the tools his boss bought, so…).

read more see 3 comments

MUST READ: Designing a Simple Disaster Recovery Solution

A few weeks ago Adrian Giacometti described a no-stretched-VLANs disaster recovery design he used for one of his customers.

The blog post and related LinkedIn posts generated tons of comments (and objections from the usual suspects), prompting Adrian to write a sequel describing the design requirements he was facing, tradeoffs he made, and interactions between server and networking team needed to make it happen.

add comment
Sidebar