design « ipSpace.net blog

Monday, October 3, 2016 08:12 +0200

Use VRFs to Solve Routing-on-Hosts Challenges

One of my readers sent me interesting feedback after reading my explanation of why I’d try not to use OSPF as a routing protocol between hosts and ToR switches. He said:

Unfortunately we can’t use BGP because IBM mainframes support only OSPF or RIP, so we decided to use VRFs instead.

Here’s what they did:

Why Would I Use BGP and not OSPF between Servers and the Network?

While we were preparing for the Cumulus Networks’ Routing on Hosts webinar Dinesh Dutt sent me a message along these lines:

You categorically reject the use of OSPF, but we have a couple of customers using it quite happily. I’m sure you have good reasons, and the reasons you list [in the presentation] are ones I agree with. OTOH, why not use totally stubby areas with hosts in such an area?

How about:

Should I Use L2VPN+MACSEC or L3VPN+GETVPN?

Here are the outlines of an interesting ExpertExpress discussion:

A global organization wanted to connect data centers across the globe with a new transport backbone.
All the traffic has to be encrypted.

Should they buy L2VPN and use MACsec on it or L3VPN and use GETVPN on it (considering they already have large DMVPN deployments in each region)?

OSPF Areas and Summarization: Theory and Reality

While most readers, commenters, and Twitterati agreed with my take on the uselessness of OSPF areas and inter-area summarization in the 21st century, a few of them pointed out that in practice, the theory and practice are not the same. Unfortunately, most counterexamples failed due to broken implementations or vendor “optimizations.”

Do We Still Need OSPF Areas and Summarization?

One of my ExpertExpress design discussions focused on WAN network design and the need for OSPF areas and summarization (the customer had random addressing and the engineers wondered whether it makes sense to renumber the network to get better summarization).

I was struggling with the question of whether we still need OSPF areas and summarization in 2016 for a long time. Here are my thoughts on the topic; please share yours in the comments.

Using BGP in Leaf-and-Spine Fabrics

In the Leaf-and-Spine Fabric Designs webinar series we started with the simplest possible design: non-redundant server connectivity with bridging within a ToR switch and routing across the fabric.

After I explained the basics (including routing protocol selection, route summarization, link aggregation and addressing guidelines), Dinesh Dutt described how network architects use BGP when building leaf-and-spine fabrics.

Watch the video

add comment

Thursday, September 1, 2016 07:24 +0200

Why Is Stretched ACI Infinitely Better than OTV?

Eluehike Chedu asked an interesting question after my explanation of why stretched ACI fabric (or alternatives, see below) is the least horrible way of stretching a subnet: What about OTV?

Time to go back to the basics. As Dinesh Dutt explained in our Routing on Hosts webinar, there are (at least) three reasons why people want to see stretched subnets:

Scaling L3-Only Data Center Networks

Andrew wondered how one could scale the L3-only data center networking approach I outlined in this blog post and asked:

When dealing with guests on each host, if each host injects a /32 for each guest, by the time the routes are on the spine, you're potentially well past the 128k route limit. Can you elaborate on how this can scale beyond 128k routes?

Short answer: it won’t.

Optimize Your Data Center: Use Distributed File System

Let’s continue our journey toward two-switch data center. What can we do after virtualizing the workload, getting rid of legacy technologies, and reducing the number of server uplinks to two?

How about replacing dedicated storage boxes with distributed file system?

In late September, Howard Marks will talk about software-defined storage in my Building Next Generation Data Center course. The course is sold out, but if you register for the spring 2017 session, you’ll get access to recording of Howard’s talk.

Watch the video

add comment

Monday, June 27, 2016 11:34 +0200

Awesome Response: Complexity Sells

Russ White wrote an awesome response to my Complexity Sells post:

[…] What we cannot do is forget that complexity is real, and we need to learn to manage it. What we must not do is continue to think we can play in the land of dragons forever, and not get burnt. […]

Now go and read the whole blog post ;)

add comment

design

Thursday, June 23, 2016 08:34 +0200

Optimize Your Data Center: Reduce the Number of Uplinks

Remember our journey toward two-switch data center? So far we:

Time for the next step: read a recent design guide from your favorite hypervisor vendor and reduce the number of server uplinks to two.

Not good enough? Building a bigger data center? There’s exactly one seat left in the Building Next Generation Data Center online course.

Watch the video

add comment

Friday, May 27, 2016 15:00 +0200

Optimize Your Data Center: Ditch the Legacy Technologies

In our journey toward two-switch data center we covered:

It’s time for the next step: get rid of legacy technologies like six 1GE interfaces per server or two FC interface cards in every server.

Need more details? Watch the Designing Private Cloud Infrastructure webinar. How about an interactive discussion? Register for the Building Next-Generation Data Center course.

Watch the video

see 5 comments

Thursday, May 19, 2016 08:11 +0200

OpenStack Networking, Availability Zones and Regions

One of my ExpertExpress engagements focused on networking in a future private cloud that might be built using OpenStack. The customer planned to deploy multiple data centers, and I recommended that they do everything they can to make sure they don’t make them a single failure domain.

Next step: translate that requirement into OpenStack terms.

Let’s Focus on Realistic Design Scenarios

An engineer working for a large system integrator sent me this question:

Since you are running a detailed series on leaf-and-spine fabrics these days, could you please suggest if following design scenarios of Facebook and Linkedin Data centers are also covered?

Short answer: No.

Unexpected Recovery Might Kill Your Data Center

Here’s an interesting story I got from one of my friends:

A large organization used a disaster recovery strategy based on stretched IP subnets and restarting workloads with unchanged IP addresses in a secondary data center;
Once they experienced a WAN connectivity failure in the primary data center and their disaster recovery plan kicked in.

However, while they were busy restarting the workloads in the secondary data center, and managed to get most of them up and running, the DCI link unexpectedly came back to life.

Category: design