Blog Posts in August 2011
In one of my vCloud Director Networking Infrastructure rants I wrote “if they had decided to use IP encapsulation, I would have applauded.” It’s time to applaud: Cisco has just demonstrated Nexus 1000V supporting MAC-over-IP encapsulation for vCloud Director isolated networks at VMworld, solving at least some of the scalability problems MAC-in-MAC encapsulation has.
Nexus 1000V VEM will be able to (once the new release becomes available) encapsulate MAC frames generated by virtual machines residing in isolated segments into UDP packets exchanged between VEMs.
When I (somewhat jokingly) wrote about the dense- and sparse-mode FCoE, I had no idea someone would try to extend the analogy to all possible FCoE topologies like Tony Bourke did. Anyhow, now that the cat is out of the bag, let’s state the obvious: enumerating all possible FCoE topologies is like trying to list all possible combinations of NAT, IP routing over at least two L2 technologies, and bridging; while it can be done, the best one can reasonably hope for is a list of supported topologies from various vendors.
However, it might make sense to give you a series of questions to ask the vendors offering FCoE gear to help you classify what their devices actually do.
Following my IBGP or EBGP in an enterprise network post a few people have asked for a more graphical explanation of IBGP/EBGP differences. Apart from the obvious ones (AS path does not change inside an AS) and more arcane ones (local preference is only propagated on IBGP sessions, MED of an EBGP route is not propagated to other EBGP neighbors), the most important difference between IBGP and EBGP is BGP next hop processing.
Most persuasive argument of the week: “Are traffic charges needed to avert a coming capex catastrophe?” by Robert Kenny. This is how you rebut the claims of the greedy Service Providers (and their hired guns), not by hysterical screaming and spitting perfected by some net neutrality zealots.
Most insightful talk: An Attempt to Motivate and Clarify Software-Defined Networking by Scott Shenker. While he’s handwaving across a lot of details, the framework does make sense.
SK left a long comment to my More OSPF-over-DMVPN Questions post describing a scenario I find quite often in enterprise networks:
- Primary connectivity is provided by an MPLS/VPN service provider;
- Backup connectivity should use DMVPN;
- OSPF is used as the routing protocol;
- MPLS/VPN provider advertises inter-site routes as external OSPF routes, making it hard to properly design the backup connectivity.
If you’re familiar with the way MPLS/VPN handles OSPF-in-VRF, you’re probably already asking the question “how could the inter-site OSPF routes ever appear as E1/E2 routes?”
I got the following question from one of my readers:
I recently started working at a very large enterprise and learnt that the network uses BGP internally. Running IBGP internally is not that unexpected, but after some further inquiry it seems that we are running EBGP internally. I must admit I'm a little surprised about the use of EBGP internally and I wanted to know your thoughts on it.
Although they are part of the same protocol, IBGP and EBGP solve two completely different problems; both of them can be used very successfully in a large enterprise network.
In the next few days, I'll write about some of the interesting topics we’ve been discussing during the last week’s fantastic on-site workshop with Ian Castleman and his team. To get us started, here’s a short video describing BGP/IGP network design principles. It’s taken straight from my Building IPv6 Service Provider Core webinar (recording), but the principles apply equally well to large enterprise networks.
Following a series of soft switching articles written by Nicira engineers (hint: they are using a similar approach as Juniper’s QFabric marketing team), Greg Ferro wrote a scathing Soft Switching Fails at Scale reply.
While I agree with many of his arguments, the sad truth is that with the current state of server infrastructure virtualization we need soft switching regardless of the hardware vendors’ claims about the benefits of 802.1Qbg (EVB/VEPA), 802.1Qbh (port extenders) or VM-FEX.
I’ve spent the last few days with a fantastic group of highly skilled networking engineers (can’t share the details, but you know who you are) discussing the topics I like most: BGP, MPLS, MPLS Traffic Engineering and IPv6 in Service Provider environment.
One of the problems we were trying to solve was a clean split of a POP into two sites, retaining redundancy without adding too much extra equipment. The strive for maximum redundancy nudged me to propose the unimaginable: layer-2 interconnect between four tightly controlled routers running BGP, but even that got shot down with a memorable quote from the senior network architect:
Warning: totally shameless plug ahead. You might want to stop reading right now.
Every now and then one of the engineers listening to my webinars shares a nice success story with me. One of them wrote:
I'm doing a DMVPN deployment and Cisco design docs just don’t cover dual ISPs for the spokes hence I thought I give your webinars/configs a try.
... and a bit later (after going through the configs that you get with the DMVPN webinar):
Reading Cisco’s marketing materials, VM-FEX (the feature probably known as VN-Link before someone went on a FEX-branding spree) seems like a fantastic idea: VMs running in an ESX host are connected directly to virtual physical NICs offered by the Palo adapter and then through point-to-point virtual links to the upstream switch where you can deploy all sorts of features the virtual switch embedded in the ESX host still cannot do. As you might imagine, the reality behind the scenes is more complex.
The flooding attacks (or mishaps) on large layer-2 networks are well known and there are ample means to protect the network against them, for example storm control available on Cisco’s switches. Now imagine you change the source MAC address of every packet sent to a perfectly valid unicast destination.
A while ago someone asked me to help him troubleshoot his Internet connectivity. He was experiencing totally weird symptoms that turned out to be a mix of MTU problems, asymmetric routing (probably combined with RPF checks on ISP side) and non-routable PE-CE subnets. While trying to figure out what might be wrong from the router configurations, I was surprised by the amount of complexity he’d managed to introduce into his DMZ design by following recipes and best practices we all dole out in blog posts, textbooks and training materials.
My Inbox is overflowing (yet again); here are some great links from last week:
Data centers and summer clouds
Matthew Norwood describes an interesting product from HP – you could probably build a small data center with a single blade enclosure.
After weeks of waiting, perfect summer weather finally arrived … and it’s awfully hard to write blog posts that make marginal sense when being dead-tired from day-long mountain biking, so I’ll just recap the conversation I had with Brian a few days ago. He asked:
How would I set up a (dual) hub running OSPF with phase 1 spokes and prevent all spoke routes from being seen at other spokes? Think service provider environment.
If you want to have a scalable DMVPN environment, you have to put numerous spokes connected to the same hub in a single IP subnet (otherwise you’ll end with point-to-point tunnels), which also means they have to be in a single OSPF area and would thus see each other’s LSAs.
Building large-scale VLANs to support IaaS services is every data center designer’s nightmare and the low number of VLANs supported by some data center gear is not helping anyone. However, as Anonymous Coward pointed out in a comment to my Building a Greenfield Data Center post, service providers have been building very large (and somewhat stable) layer-2 transport networks for years. It does seem like someone is trying to reinvent the wheel (and/or sell us more gear).
I’ve already written about the stupidities of risking the stability of two data centers to enable live migration of “mission critical” VMs between them. Now let’s take the discussion a step further – after hearing how critical the VM the server or application team wants to migrate is, you might be tempted to ask “and how do you ensure its high availability the rest of the time?” The response will likely be along the lines of “We’re using VMware High Availability” or even prouder “We’re using VMware Fault Tolerance to ensure even a hardware failure can’t bring it down.”
Accumulated in my Inbox during the second half of July:
Duncan Epping wrote a long series of posts describing the new VMware’s High Availability implementation: Fault domain manager, Primary nodes, Datastore heartbeating, Restarting VMs and finally Advanced settings.
Chris sent me an interesting question:
Imagine L2 traffic between two VMs on different ESX hosts, both using Nexus 1000V. Will the physical switches see the traffic with source and destination MACs matching the VM’s vNICs or traffic on NX1000V “packet” VLAN between VEMs (in this case, the packet VLAN would act as a virtual backplane)?
It seems that most networking vendors consider the Flat Earth architectures the new bonanza ... everyone is running to join the gold rush, from Cisco’s FabricPath and Brocade’s VCS to HP’s IRF and Juniper’s upcoming QFabric. As always, the standardization bodies are following the industry with a large buffet of standards to choose from: TRILL, 802.1ag (SPB), 802.1Qbg (EVB) and 802.1bh (Port extenders).
The following design challenge landed in my Inbox not too long ago:
My organization is the in the process of building a completely new data center from the ground up (new hardware, software, protocols ...). We will currently start with one site but may move to two for DR purposes. What DC technologies should we be looking at implementing to build a stable infrastructure that will scale and support technologies you feel will play a big role in the future?
In an ideal world, my answer would begin with “Start with the applications.”