Category: design

New: ipSpace.net Design Clinic

In early September, I started yet another project that’s been on the back burner for over a year: ipSpace.net Design Clinic (aka Ask Me Anything Reasonable in a more structured format). Instead of collecting questions and answering them in a podcast (example: Deep Questions podcast), I decided to make it more interactive with a live audience and real-time discussions. I also wanted to keep it valuable to anyone interested in watching the recordings, so we won’t discuss obscure failures of broken designs or dirty tricks that should have remained in CCIE lab exams.

read more add comment

Designing Networks: From Tricycles to Aircraft Carriers

I planned to take my summer break seriously and stop blogging until late August, but then I shouldn’t have looked at my Twitter feed (my bad), where the AI algorithms selected just the right morsel to trigger the maximum rantiness. I would strongly recommend you read the original tweet and all the responses first – it looks like it was a serious suggestion, not a trolling exercise (here’s a copy of the original idea in case the tweets get lost in the mists of time).

read more see 1 comments

Stretched VLANs: What Problem Are You Trying to Solve?

One of ipSpace.net subscribers sent me this interesting question:

I am the network administrator of a small data center network that spans 2 buildings. The main building has a pair of L2/L3 10G core switches. The second building has a stack of access switches connected to the main building with 10G uplinks. This secondary datacenter has got some ESX hosts and NAS for remote backup and some VM for development and testing, but all the Internet connection, firewall and server are in the main building.

There is no routing in the secondary building and most of the VLANs are stretched. Do you think I must change that (bringing routing to the secondary datacenter), or keep it simple like it is now?

As always, it depends, this time on what problem are you trying to solve?

read more add comment

Questions about BGP in the Data Center (with a Whiff of SRv6)

Henk Smit left numerous questions in a comment referring to the Rethinking BGP in the Data Center presentation by Russ White:

In Russ White’s presentation, he listed a few requirements to compare BGP, IS-IS and OSPF. Prefix distribution, filtering, TE, tagging, vendor-support, autoconfig and topology visibility. The one thing I was missing was: scalability.

I noticed the same thing. We kept hearing how BGP scales better than link-state protocols (no doubt about that) and how you couldn’t possibly build a large data center fabric with a link-state protocol… and yet this aspect wasn’t even mentioned.

read more see 3 comments

Worth Reading: Running BGP in Large-Scale Data Centers

Here’s one of the major differences between Facebook and Google: one of them publishes research papers with helpful and actionable information, the other uses publications as recruitment drive full of we’re so awesome but you have to trust us – we’re not sharing the crucial details.

Recent data point: Facebook published an interesting paper describing their data center BGP design. Absolutely worth reading.

Just in case you haven’t realized: Petr Lapukhov of the RFC 7938 fame moved from Microsoft to Facebook a few years ago. Coincidence? I think not.

see 5 comments

Worth Reading: Rethinking Internet Backbone Architectures

Johan Gustawsson wrote a lengthy blog post describing Telia’s approach to next-generation Internet backbone architecture… and it’s so refreshing seeing someone bringing to life what some of us have been preaching for ages:

  • Simplify the network;
  • Stop cramming ever-more-complex services into the network;
  • Bloated major vendor NPUs implementing every magic ever envisioned are overpriced – platforms like Broadcom Jericho2 are good enough for most use cases.
  • Return from large chassis-based stupidities to network-centric high availability.

I don’t know enough about optics to have an opinion on what they did there, but it looks as good as the routing part. It would be great to hear your opinion on the topic – write a comment.

add comment

Video: Cisco SD-WAN Site Design

In the Site Design part of Cisco SD-WAN webinar, David Penaloza described capabilities you can use when designing complex sites, like extending SD-WAN transport between SD-WAN edge nodes, or implementing high availability between them. He also explained how to track an Internet-facing interface and a service beyond its next hop.

You need Free ipSpace.net Subscription to watch the video.
add comment

Worth Reading: When Stretching Layer Two, Separate Your Fate

Ethan Banks wrote the best one-line description of the crazy stuff we have to deal with in his When Stretching Layer Two, Separate Your Fate blog post:

No application should be tightly coupled to an IP address. This common issue should really be solved by application architects rebuilding the app properly instead of continuing like it’s 1999 while screaming YOLO.

Not that his (or my) take on indisputable facts would change anything… At least we can still enjoy a good rant ;)

add comment

Worth Reading: Understand Your Single Points of Failure

I’ve been saying the same thing for years, but never as succinctly as Alastair Cooke did in his Understand Your Single Points of Failure (SPOF) blog post:

The problem is that each time we eliminated a SPOF, we at least doubled our cost and complexity. The additional cost and complexity are precisely why we may choose to leave a SPOF; eliminating the SPOF may be more expensive than an outage cost due to the SPOF.

Obviously that assumes that you’re able to follow business objectives and not some artificial measure like uptime. Speaking of artificial measures, you might like the discussion about taxonomy of indecision.

add comment

Worth Reading: The Insider's Guide To Evangelizing Good Design

Scott Berkun wrote another great article that’s equally applicable to the traditional notion of design (his specialty) and the network design. Read it, replace design with network design, and use its lessons. Here’s just a sample:

  • Convincing people is a social process
  • Aim for small wins, not conversions of belief systems
  • Allies matter more than ideas
  • Design maturity grows one step at a time.
add comment

Using Unequal-Cost Multipath to Cope with Leaf-and-Spine Fabric Failures

Scott submitted an interesting the comment to my Does Unequal-Cost Multipath (UCMP) Make Sense blog post:

How about even Large CLOS networks with the same interface capacity, but accounting for things to fail; fabric cards, links or nodes in disaggregated units. You can either UCMP or drain large parts of your network to get the most out of ECMP.

Before I managed to write a reply (sometimes it takes months while an idea is simmering somewhere in my subconscious) Jeff Tantsura pointed me to an excellent article by Erico Vanini that describes the types of asymmetries you might encounter in a leaf-and-spine fabric: an ideal starting point for this discussion.

read more see 5 comments

Impact of Azure Subnets on High Availability Designs

Now that you know all about regions and availability zones (AZ) and the ways AWS and Azure implement subnets, let’s get to the crux of the original question Daniel Dib sent me:

As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? You mentioned that it’s difficult to create application swimlanes that way. But does subnet matter if your VMs are in different AZs?

It’s time I explain the concepts of application swimlanes and how they apply to availability zones in public clouds.

read more add comment
Sidebar