Last batch of interesting links collected in 2011 ... starting with Pricing and Trading Networks: Down is Up, Left is Right: A fantastic post from Chris Marget explaining the behavior of trading floor networks. For me, it was an amazing view into a parallel universe with a completely different set of physical laws.
2011 was a fantastic year for a networking geek, and you were awesome – helping me figure out the intricacies of new technologies, fixing my errors, and asking so many great questions that prompted me to dive deeper into the rabbit holes. I owe you a huge Thank you!
I hope you’ll be able to shut down your smartphones and pagers in the next few days and spend a few relaxing moments with your families … and I wish you great networking in 2012!
To keep the geeky spirit: snow angel as seen by Hubble
Recent Cisco IOS releases have significant improvements in DHCPv6 functionality and other IPv6 access network features. These improvements, as well as additional access network methods (including 6rd), will be described in the Building IPv6 Access Networks webinar on January 25th (register).
After I published the Decouple virtual networking from the physical world article, @paulgear1 sent me a very valid tweet: “You seemed a little short on suggestions about the path forward. What should customers do right now?” Apart from the obvious “it depends”, these are the typical use cases (as I understand them today – please feel free to correct me).
15 years after NAT was invented, I’m still getting questions along the lines of “is NAT a security feature?” Short answer: NO!
Longer answer: NAT has some side effects that resemble security mechanisms commonly used at the network edge. That does NOT make it a security feature, more so as there are so many variants of NAT.
I’m planning a series of shorter (~ 1 hour) update-type webinars in 2012. Some of them will cover new features and technologies that have been introduced since the time I last updated some of the most popular webinars (Data Center, VMware networking), others will focus on emerging technologies.
I would appreciate if you could help me plan them by taking a short survey, telling me which of the topics I identified are most important for you, and adding your favorite topics to the list. The survey won’t take more than a few minutes of your time.
The coolest tool of the week: mxtommy/Cisco-SSH-Client. Thomas St. Pierre did a fantastic job - he modified SSH client to colorize the printouts
generated by Cisco routers (similar to what VIM is doing to source code). Download
his SSH client patches, recompile your SSH client, and enjoy!
And here are the other links accumulated in my Inbox, this time in somewhat more structured format ... and a (hopefully interesting) surprise at the end.
After a week of oversized articles, I’ll try to keep this one short. This is a true story someone recently shared with me (for obvious reasons I can’t tell you where it happened ... and no, I’m not making it up). Enjoy!
A few days ago I had the privilege of being part of an VXLAN-related tweetfest with @bradhedlund, @scott_lowe, @cloudtoad, @JuanLage, @trumanboyes (and probably a few others) and decided to write a blog post explaining the problems VXLAN faces due to lack of control plane, how it uses IP multicast to solve that shortcoming, and how OpenFlow could be used in an alternate architecture to solve those same problems.
Anyone serious about high-availability connects servers to the network with more than one uplink, more so when using converged network adapters (CNA) with FCoE. Losing all server connectivity after a single link failure simply doesn’t make sense.
If at all possible, you should use dynamic link aggregation with LACP to bundle the parallel server-to-switch links into a single aggregated link (also called bonded interface in Linux). In theory, it should be simple to combine FCoE with LAG – after all, FCoE runs on top of lossless Ethernet MAC service. In practice, there’s a huge difference between theory and practice.
Every time I write about IPv6 multihoming issues and the need for NPT66, I get a comment or two saying “but I thought this is already part of IPv6 stack – can’t you have two or more IPv6 addresses on the same interface?” The commentators are right, you can have multiple IPv6 addresses on the same interface; the problem is: which one do you choose for outgoing sessions.
The source address selection rules are specified in RFC 3484 (Greg translated that RFC into an easy-to-consume format a while ago), but they are not very helpful as they cannot be influenced by the CPE router. Let’s look at the details.
It’s getting harder and harder to decide whether to choose physical devices to do L4-7 processing (stateful- and web application firewalling, load balancing, VPN termination, WAN optimization) in your virtualized data center, or whether to deploy VM version of the same appliances.
Physical devices usually perform better. Virtual appliances are more flexible, but don’t scale well ... and Embrane just complicated your decision-making process: they launched scale-out distributed virtual appliance architecture and products that combine the best of both worlds.
Isn’t it amazing that we can build the Internet, run the same web-based application on thousands of servers, give millions of people access to cloud services … and stumble badly every time we’re designing virtual networks. No surprise, by trying to keep vSwitches simple (and their R&D and support costs low), the virtualization vendors violate one of the basic scalability principles: complexity belongs to the network edge.
DHCPv6 server on Cisco IOS got several highly useful enhancements since the first time I started looking into its behavior. Seems like most of them are implemented only in 15.xS trains (where they are most badly needed one would assume), but there’s hope those changes will eventually trickle down into mainstream IOS.
I thought Nexus 1000V is like Aspirin compared to VMware’s vSwitch, providing tons of additional functionality (including LACP and BPDU filter) and familiar NX-OS CLI. It turns out I was right in more ways than I imagined; Nexus 1000V solves a lot of headaches, but can also cause heartburn due to a particular combination of its distributed architecture and reliance on vDS object model in vCenter.
My friend Tom Hollingsworth has written another NAT66-is-evil blog post. While I agree with him in principle, and most everyone agrees NAT as we know it from IPv4 world is plain stupid in IPv6 world (NAPT more so than NAT), we just might need NPT66 (Network Prefix Translation; RFC 6296) to support small-site multihoming ... and yet again, it seems that many leading IPv6 experts grudgingly agree with me.
In the VMware vSwitch – the baseline of simplicity post I described simple layer-2 switches offered by most hypervisor vendors and the scalability challenges you face when trying to build large-scale solutions with them. You can solve at least one of the scalability issues pretty easily: VM-aware networking solutions available from most data center networking vendors dynamically adjust the list of VLANs on server-to-switch links.
Let’s start with people who are trying to fix real-life problems. Browser and OS vendors are working around the lack of session layer – happy eyeballs approach solves dual-stack problem in either OS stack (Apple) or in the browsers (Chrome, Firefox). The results are ... interesting ;) ... but it seems most implementations are on the right track.
When I started making my first wobbling steps into the Junos MPLS world, Dan (@Johansfo) Backman took time to explain the differences between Cisco IOS and Junos MPLS implementations (and some of the reasons they are so different). This is my feeble attempt at describing what I understood he told me.
If you’re looking for a simple virtual switch, look no further than VMware’s venerable vSwitch. It runs very few control protocols (just CDP or LLDP, no STP or LACP), has no dynamic MAC learning, and only a few knobs and moving parts – ideal for simple deployments. Of course you have to pay for all that ease-of-use: designing a scalable vSwitch-based solution is tough (but then it all depends on what kind of environment you’re building).
For whatever reason I decided to start my Junos experience with a very simple IS-IS network – four core routers from my Building IPv6 Service Provider Core webinar. As Junosphere doesn’t support serial or POS interfaces, I migrated all links to Gigabit Ethernet and added a point-to-point GE link between PE-A and PE-B.
I have a different view regarding VMware vSwitch. For me its the best thing happened in my network in years. The vSwitch is so simple, and its so hard to break something in it, that I let the server team to do what ever they want (with one small rule, only one vNIC per guest). I never have to configure a server port again :).
As always, the right answer is “it depends” – what kind of vSwitch you need depends primarily on your requirements.
Jónatan Þór Jónasson took the time to implement Wake-on-LAN functionality using UDP support introduced in Cisco IOS Tcl in release 15.1(1)T. He found a TCL/TK example of a magic packet being sent, used that as a base, and with small modifications got it to work on his router. Here‘s his code (it’s obviously a proof-of-concept, but you need just a few more lines to get a working Tclsh script):
If you read my Twitter stream, you’ve probably realized I’d been stupid enough to decide to do another multi-vendor experiment: I’m trying to figure out whether an old grump can adapt to a MacBook Air.
Warning: What follows is a rant. You might want to skip this one and read something more technical.
Many service providers choosing IS-IS as their IGP use it within a single area (or at least run all routers as L1L2 routers). Multi-level IS-IS design is a royal pain, more so in MPLS environments where every PE-router needs a distinct route for every BGP next hop (see also Disable L1 Default Route in IS-IS). Moreover, MPLS TE is reasonably simple only within a single level (L1 or L2).
I’m positive at least some service providers do something as stupid as I usually did – deploy IS-IS with default settings using a configuration similar to this one:
My Junos versus Cisco IOS: Explicit versus Implicit received a huge amount of helpful comments, some of them slightly philosophical, others highly practical – from using interfaces all combined with interface disable in routing protocol configuration, to using configuration groups (more about that fantastic concept in another post).
However, understanding what’s going on is not the same as being able to explain it in one sentence ... and Dan (@jonahsfo) Backman beautifully nailed that one.
You’re probably tired of this story by now: public IPv4 addresses are running out, lots of content is available only over IPv4, and so the service providers use NAT to give new clients (with no public IPv4 address) access to old content. It doesn’t matter which NAT variant the service provider is using, be it Carrier Grade Nat (CGN), NAT64, DS-Lite or A+P, the crucial problem is always the same: multiple users are hidden behind a single source IP address.
Best design information of the week: Chris Marget write a series on practical data center network designs (10GE servers connected to Nexus 5K) – Part 1 describes ToR Nexus 5596, Part 2 ToR Nexus 5548 and Part 3 a pair of Nexus 5548 with ToR FEX.
And here’s the rest of my Inbox collection:
My first Junosphere project was an IPv6 backbone; I wanted to create a simple single-area IS-IS/BGP-free backbone running LDP and MPLS, and using 6PE for IPv6 connectivity. Needless to say, even though I read the excellent Day One books (highly recommended: Exploring IPv6, Advanced IPv6 configuration and Deploying MPLS), I stumbled on almost every step.
The Data Center Fabric Architectures webinar was very well received, took way longer than I initially predicted (3 hours instead of 2, but that’s usual with my webinars), and contained almost no factual errors (this is the part that I’m most happy about). Obviously I’m not that good; a lot of people were helping.
I would like to express a huge thank you to Raj Toor (Alcatel Lucent), Doug Gourlay (Arista), Lisa Caywood, Jon Hudson and Brook Reams (Brocade), Ron Fuller and Omar Sultan (Cisco), Brad Hedlund and Peter Wohlers (Force 10), Abner Germanow, Nadia Walker and product management teams from Juniper, Khalid Raza (HP) and Samrat Ganguly and Su-Hun Yun (NEC America). They were fixing my errors, pointing out my omissions, and helped me fine-tune the grading system, making the webinar way more accurate.
If you’ve missed the webinar, don’t worry – the recording is already available, and I’ll run another session in May, updating the scorecards with the improvements vendors will make in the next six months.
Michel sent me a detailed e-mail describing both his enthusiasm with vPC and the headaches consistency checker is causing him. Here’s the good part:
Nexus vPC seems like a perfect solution for real multi-chassis etherchannel. At work we're using it extensively on a few pairs of Nexus 7000's.
... and then it turns sour:
However, there is one MAJOR drawback with vPC at this time, it's the way the consistency checker works (or rather, does not work). We've come across two specific situations where consistency checker will bring down your beautiful and redundant vPC link, and we've found no way around.
Here are his problems:
The comments igp2bgp and Tiziano Tofoni made to my LDP-IGP Synchronization in MPLS Networks post prompted me to look deeper into basic Junos MPLS configuration and LDP behavior. As expected, there are some significant differences between Cisco’s and Juniper’s LDP implementations (and, as is usually the case, they’re both strictly conformant with RFC 5036).
One of the comments I usually get about OpenFlow is “sounds great and I’m positive Yahoo! and Google will eventually use it, but I see no enterprise use case.” (see also this blog post). Obviously nobody would go for a full-blown native OpenFlow deployment and we’ll probably see hybrid (ships-in-the-night) approach more often in research labs than in enterprise networks, but there’s always the integrated mode that allows you to add OpenFlow-based functionality on top of existing networking infrastructure.
A reader of my blog planning to migrate his network from a traditional BGP-everywhere design to a BGP-over-MPLS one wondered about potential unexpected consequences. The MTU implications of introducing MPLS in a running network are usually well understood (even though you could get some very interesting behavior); if you can, increase the MTU size by at least 16 bytes (4 labels) and check whether MTU includes L2 header. Another somewhat more mysterious beast is the interaction between IGP and LDP that can cause traffic disruptions after the physical connectivity has been reestablished.
During the last days there have been rumors of flying pigs and open speculations whether I’d rename my blog to junoshints or junioshints due to my Junosphere-related posts. When even my wife told me to get my act together, it was time to move ... and you can see the first changes at the top left corner of the screen.
The last week has been “interesting” – I created the draft slide deck for the Data Center Fabric Architectures webinar on November 16th (register here) and sent the relevant slides to all vendors mentioned in the presentation to give them a chance to fix my errors – every vendor got at least the scorecard describing my understanding of their solution.
Abner (@abnerg) Germanow and Dan (@jonahsfo) Backman are as good as their word: this week I got access to Junosphere, a great network-in-the-Clouds solution from Juniper. You might be familiar with Olive, the “non-existent” way of running Junos on an x86 machine (including a VM); Junosphere is the supported version of the same concept, including a real forwarding plane (it’s my understanding Olive lacks that, which makes certain protocols behave in unexpected ways).
Yesterday’s 6th Slovenian IPv6 Summit was (as always) full of awesome presentations, this time coming straight from some of the IPv6 legends: check the ones from Eric Vyncke (and make sure you read his IPv6 Security book), Randy Bush and Mark Townsley. The epic moment, however, was the “I was getting bored” part of Eric’s presentation (starts around 0:50:00). This is (in a nutshell) what he did:
Abner (@abnerg) Germanov surprised us all at the end of Juniper’s presentation at Networking Tech Field Day when he announced Junosphere access for all the delegates – after a year of nagging, I would finally be able to touch Junos. However, instead of taking it easy and studying the excellent Junos Day One books (which I also did – if you’re new to Junos you should definitely start there; they are well worth reading), I decided to take a more geeky approach.
Big Switch Networks is one of those semi-stealthy startups that like to hint at what they’re doing without actually telling you anything, so I was very keen to meet Kyle Forster and Guido Appenzeller during the OpenFlow Symposium and asked them a simple question: “can you explain in 3 minutes what it is you’re doing?”
The “discovery of the week” award goes to Terry Slattery for pointing out the dangers of bufferbloat while investigating TCP retransmissions (part 1 and part 2). BTW, in the end, he figured out it was just an overloaded Gigabit Ethernet linecard.
Did you ever want to have a high-level overview of how 3G/4G mobile networks work? Where GGSN and SGSN fit in? What the PDP contexts are ... and why you need two for dual-stack connectivity? All that (and a lot more) is explained in very well written IETF draft IPv6 in 3GPP Evolved Packet System. Reading highly recommended.
I stumbled upon VMsafe Network API (the API formerly known as dvFilter) while developing my VMware Networking Deep Dive webinar, set up the vShield App 4.1 in a lab, figured out how it works (including a few caveats), and assumed that’s how most virtual firewalls using dvFilter work. Boy was I wrong!
An engineer attending my VMware Networking Deep Dive webinar has asked me a tough question that I was unable to answer:
What happens if a VM running within a vSphere host sends a BPDU? Will it get dropped by the vSwitch or will it be sent to the physical switch (potentially triggering BPDU guard)?
I got the answer from visibly harassed Kurt (@networkjanitor) Bales during the Networking Tech Field Day; one of his customers has managed to do just that.
Update 2011-11-04: The post was rewritten based on extensive feedback from Cisco, VMware and numerous readers.
Finally someone decided to make IPv6 flow label useful. First they had to justify why they want to change it, and then modify the definition (way too much work for a field nobody ever used). Planned use is to enhance ECMP load balancing, both in native IPv6 environments (where using the flow label is faster than digging deep into variable-length IPv6 extension headers) and (even more importantly) in tunneled environments, where the flow label propagates the entropy from the tunnel payload into the envelope header.
I hope you never believed the “OpenFlow networking nirvana” hype in which smart open-source programmable controllers control dumb low-cost switches, busting the “networking = mainframes” model and bringing the Linux-like golden age to every network. As the debates during the OpenFlow symposium clearly illustrated, the OpenFlow reality is way more complex than it appears at a first glance.
To make it even more interesting, at least four different models for OpenFlow deployment have already emerged:
A few weeks ago I delivered a short L2 DCI WebEx presentation to CCIE Club Poland. I took the L2 part of my Data Center Interconnect webinar and added 15 minutes of L2 DCI mythbusting (yes, it took me that long to explain what @reillyusa summarized in a single tweet). Here’s that part of the presentation; for the rest, buy a recording of my Data Center Interconnect webinar (it contains much more than just L2 DCI).
Michael sent me an interesting question:
I work in a rather large enterprise facing a campus network redesign. I am in favor of using a routed access for floor LANs, and make Ethernet segments rather small (L3 switching on access devices). My colleagues seem to like L2 switching to VSS (distribution layer for the floor LANs). OSPF is in use currently in the backbone as the sole routing protocol. So basically I need some additional pros and cons for VSS vs Routed Access. :-)
The follow-up questions confirmed he has L3-capable switches in the access layer connected with redundant links to a pair of Cat6500s:
The last few days were exquisite fun: it was great meeting so many people focusing on a single technology (OpenFlow) and concept (Software-Defined Networking, whatever that means) that just might overcome some of the old obstacles (and introduce new ones). You should be at least a bit curious what this is all about, and even if you don’t see yourself ever using OpenFlow or any other incarnation of SDN in your network, it never hurts to enhance your resume with another technology (as long as it’s relevant; don’t put CICS programmer at the top of it).
We finished a fantastic Network Field Day (second edition) yesterday. While it will take me a while (and 20+ blog posts) to recover from the information blast I received during the last two days, here are the first impressions:
Explosion of innovation – and it’s not just OpenFlow and/or SDN. Last year we’ve seen some great products and a few good ideas (earning me the “grumpy old man that’s hard to make smile” fame), this year almost every vendor had something that excited me.
A while ago I got a set of MPLS/VPN-related questions from one of my long-time readers furiously working on a response to a large RFP. I answered the questions and (more as an afterthought) mentioned the ExpertExpress service I had been starting to consider. His response amazed me:
ExpertExpress is definitely a very very good idea!!! You know what? I think I will push the company to try to use it to get your advice on the current engagement. The company needs this "yesterday" so I would be able to verify my design and will feel safer with it and will deliver it on time and of course you will receive a fair payment for this.
Next question – when could we do it? Response: how about tomorrow? Sure, no problem (note: it doesn’t always work out that way).
Like every other blogger, I get occasional e-mails from people fishing for free consulting or second opinion (note: asking a serious technical question is a totally different story; as many people know, I always try to reply and help) and as I’m totally overloaded with OpenFlow symposium and Net Field Day these days, I decided to share one of the better ones.
The second presentation I had @ EuroNOG 2011 described networking requirements of various cloud services – a perfect combination of virtual networking and scalability, two of the topics I’m currently obsessed with. If you’re looking for more details, register for the Cloud Computing Networking: Under the Hood webinar on December 14th.
Initial release of QFabric Junos can run STP only within the network node (see QFabric Control Plane post for more details), triggering an obvious question: “what happens if a server multihomed to a server node starts bridging between its ports and starts sending BPDUs?”. Some fabric solutions try to ignore STP (the diplomats would say “they are transparent to STP”) but fortunately Juniper decided to do the right thing.
While everyone deeply involved with OpenFlow agrees it’s just a low-level tool that can’t solve problems we couldn’t solve in the past (just like replacing Tcl with C++ won’t help you prove P = NP), occasionally you stumble across mindboggling ideas that are so simple you have to ask yourself: “were we really that stupid?” One of them that obviously impressed James Hamilton is the solution to load balancing that requires no load balancers.
Before clicking Read more, watch this video and try to figure out what the solution is and why we’re not using it in large-scale networks.
After more than a year, I’m back in California, anxiously waiting to meet my fellow bloggers and ask some tough questions to a fantastic lineup of vendors presenting at Net Field Day 2011. Stephen Foskett’s well-oiled organizing machinery is already in full gear; I’m typing this post from a WiFi-equiped car that picked me up @ SFO airport (you see, dear vendors, it’s so easy to make my inner geek happy ... all I need are some fantastic features that are actually usable and work as well as this WiFi connection).
I spent a lot of time during the last year analyzing various data center architectures proposed by networking vendors – from simple solutions like MLAG to complex architectures like QFabric – and presented the summary of my findings in a short presentation @ EuroNOG 2011 that received positive feedback from Cisco, Juniper and Brocade (and a post from Brook Reams @ EthernetFabric). That presentation (you can view it online) was pretty short due to the 45-minute slot I got and I decided to expand it into a proper 2-to-3 hour webinar.
InformationWeek has recently published an OpenFlow article by Jeff Doyle in which they graced me with a single grumpy quote taken out of three pages of excellent questions that Jeff asked me when preparing for the article. Jeff has agreed that I publish the original questions and my answers to them. Here they are (totally unedited):
This is a nice email I got from an engineer struggling with multi-homing BGP setup:
We faced a problem with our internet routers a few days back. The engineer who configured them earlier used the syntax: network x.x.x.x mask y.y.y.y route-map PREPEND to influence the incoming traffic over two service-providers.
... and of course it didn’t work.
Yesterday New York Times published an article covering Nicira, a semi-stealthy startup working on an open source soft switch (Open vSwitch) and associated OpenFlow-based controller, triggering immediate responses from GigaOm and Twilight in the Valley of the Nerds. While everyone got entangled in the buzzwords (or lack of them), not a single article answered the question “what is Nicira really doing?” Let’s fix that.
One of the areas where IPv6 sorely lacks feature parity with IPv4 is user authentication and source IP spoofing prevention in large-scale Carrier Ethernet networks. Metro Ethernet switches from numerous vendors offer all the IPv4 features a service provider needs to build a secure and reliable access network where the users can’t intercept other users’ traffic or spoof source IP addresses, and where it’s always possible to identify the end customer from an IPv4 address – a mandatory requirement in many countries. Unfortunately, you won’t find most of these features in those few Metro Ethernet switches that support IPv6.
A week ago I was writing about the latency and bandwidth challenges of long-distance vMotion and why it rarely makes sense to use it in disaster avoidance scenarios (for a real disaster avoidance story, read this post ... and note no vMotion was used to get out of harm’s way). The article I wrote for SearchNetworking tackles an idea that is an order of magnitude more ridiculous: using vMotion to migrate virtual machines around the world to bring them close to the users (follow-the-sun workload mobility). I wonder which islands you’d have to use to cross the Pacific in 10ms RTT hops supported by vMotion?
While preparing for the IPv6 seminar I’m delivering in Rome, I had to reinvent a few wheels, including slides explaining IPv6 addressing and host behavior ... giving me a perfect reason to study the RFCs and figure out how exactly IPv6 stateless autoconfiguration (RFC 4862) works.
Greg (@etherealmind) Ferro started an interesting discussion on Google+, claiming MPLS is just tunneling and a duct tape like NAT. I would be the first one to admit MPLS has its complexities (not many ;) and shortcomings (a few ;), but calling it a tunnel just confuses the innocents. MPLS is not tunneling, it’s a virtual-circuits-based technology, and the difference between the two is a major one.
Got this set of questions from a CCIE pondering emerging technologies that could be of potential use in his data center:
I don’t think OpenFlow is clearly defined yet. Is it a protocol? A model for Control plane – Forwarding plane FP interaction? An abstraction of the forwarding-plane? An automation technology? Is it a virtualization technology? I don’t think there is consensus on these things yet.
As the OpenFlow Symposium is just a few weeks away, let’s try to position OpenFlow in the big picture.
Every time I’m discussing the VXLAN technology with a fellow networking engineer, I inevitably get the question “how will I connect this to the outside world?” Let’s assume you want to build pretty typical 3-tier application architecture (next diagram) using VXLAN-based virtual subnets and you already have firewalls and load balancers – can you use them?
The product information in this blog post is outdated - Arista, Brocade, Cisco, Dell, F5, HP and Juniper are all shipping hardware VXLAN gateways (this post has more up-to-date information). The concepts explained in the following text are still valid; however, I would encourage you to read other VXLAN-related posts on this web site or watch the VXLAN webinar to get a more recent picture.
I was overloaded during the last few weekends and my Inbox is yet again overflowing with links to excellent content. For a warm-up, look at the eight levels of vendor acceptance (a side effect of a really tough lab test during the EuroNOG 2011 conference).
On a more serious note, the most useful article of this week is probably the BGPmon Web Services API that describes how you can query the global BGP table through whois or SOAP.
Ethan Banks, one of the masterminds behind the Packet Pushers podcast, wrote a spot-on blog describing why enterprises don’t deploy IPv6. Unfortunately, most of the enterprise networking engineers follow the same line of reasoning, and a few of them might feel like the proverbial deer caught in the headlights once something totally unexpected happen ... like their CEO vacationing in China, getting only IPv6 address on the iPhone, and thus not being able to access a mission-critical craplication. For a longer-term perspective, read an excellent reply written by Tom Hollingsworth.
I’ve first heard about CloudSwitch when writing about vCider. It seemed like an interesting idea and I wanted to explore the networking aspects of cloud VLAN extension for my EuroNOG presentation. My usual approach (read the documentation) failed – the documentation is not available on their web site – but I got something better: a briefing from Damon Miller, their Director of Technical Field Operations. So, this is how I understood CloudSwitch works (did I get it wrong? Write a comment!):
As you might know, I did a Data Center Fabric Architectures presentation at the EuroNOG 2011 conference last week. The slides are available online; for a more in-depth version, please register for my Data Center Fabrics webinar.
As always, if I got anything wrong, please write a comment.
There’s so much great material being written on VXLAN and NVGRE that I decided to write a separate post listing the best of it (if you still can’t decide whether you should care about VXLAN, register for my Introduction to Virtual Networking webinar).
The question of high-availability cloud services (let’s agree reliable in this context really means highly available) pops up every time I discuss cloud networking requirements with enterprise-focused experts. While it’s obvious the software- and platform services must be highly available (as their users have few mechanisms to increase their availability), Infrastructure-as-a-Service (IaaS) remains a grey area.
However, once you look at the question from the business perspective, it seems Amazon probably made a pretty good choice: offer reasonably-available service at a low price. You’ll find more in-depth arguments in the Cloud infrastructure services: Balancing high availability and cost article I wrote for SearchCloudProvider.com.
I just came back from a fantastic conference – EuroNOG 2011 in beautiful Krakow. I’ve been to too many conferences in my life, but this one really stood out for two reasons: the nerdiness factor (where it got to the level of advanced presentations @ Cisco Live) and the fantastic crew organizing it.
The proponents of inter-DC layer-2 connectivity (required by long-distance vMotion) inevitably cite disaster avoidance (along with buzzword-bingo-winning business agility) as one of the primary requirements after they figure out stretched clusters might not be such a good idea (and there’s no way to explain the dangers of split subnets to some people). When faced with the disaster avoidance “requirement”, ask them to do some basic math first.
Got the following question with an invalid return address, so I’m broadcasting the reply ;)
I am running a DMVPN network and recently got a requirement for spoke-to-spoke communication. We currently shape traffic on a per spoke basis on the hub, and have a single shaper at the remote site. However, if a spoke is receiving a large amount of traffic from the hub and another spoke site, how will the sites sending traffic know that the remote port is congested?
Short answer – they won’t. You have a mission-impossible problem (very similar to ADSL QoS), but there might be some slight silver lining:
You won’t find much about the QFabric forwarding architecture and resulting behavior in the documentation; white papers might give you more insight and I’m positive more detailed ones will start appearing on Juniper’s web site now that the product is shipping. In the meantime, let’s see how far we can get based on two simple assumptions: (A) The "one tier architecture" claim is true and (B) Juniper has some very smart engineers.
Just a few hours after VXLAN was launched, I received an e-mail from one of my readers asking (literally) if VXLAN was awesome or braindead. I decided to answer this question (you know the right answer is it depends) and a few others in a FastPacket blog post published by SearchNetworking.
I wrote the post before NVGRE was published and missed the “brilliant” idea of using GRE key as virtual segment ID.
Occasionally my readers ask me if I would be available for a consulting/design project (or send me questions that are actually design review/second opinion challenges). I usually recommend using our Professional Services team for larger projects (I try to do only a few larger consulting projects per year ... but that shouldn’t stop you from asking ;), but quite often the amount of work involved is so low that it simply doesn’t make sense to go through all the paperwork nightmare and I decided to create ExpertExpress service to address those cases.
I’ll be in Krakow for the PLNOG/EuroNOG conferences Wednesday through Friday. This is not the primary reason I’m arriving on Wednesday (although it does look tempting) – I wanted to have enough time for discussions with fellow networking engineers and Thursday afternoon/Friday will probably be pretty busy. So, if you’d like to chat with me about exciting networking technologies, just find me in the crowd (unfortunately I won’t be wearing this T-shirt) ... and if you’d like to have a more serious (and longer) discussion, get in touch with me or send me a tweet.
Like anyone else, I was pretty impressed with the QFabric hardware architecture when Juniper announced it, but remained way more interested in the control-plane aspects of QFabric. After all, if you want multiple switches to behave like a single device, you could either use Borg-like architecture with a single control plane entity, or implement some very clever tricks.
Nobody has yet demonstrated a 100-switch network with a single control plane (although the OpenFlow aficionados would make you believe it’s just around the corner), so it must have been something else.
Imagine you’d actually want to run VXLAN between two data centers (I wouldn’t but that’s beyond the point at this moment) and the only connectivity between the two is IP, no multicast. How would you implement IP multicast across a generic IP backbone? Anything goes, from duct tape (GRE) to creative solutions ... and don’t forget those pesky RPF checks.
Comparing promises, deliverables and generic progress seems to be popular in the harvest season, so let’s see how far Cisco pushed the Data Center IPv6 support in the six months since my last status report.
Kudos to the Nexus 7000/NX-OS team for doing the right thing. Not only did they make me happy by implementing full-blown MPLS, MPLS/TE and MPLS/VPN, they included 6PE and 6VPE in the first release of the MPLS code. Great job!
Juniper has finally released the technical documentation for the QFabric virtual switch and its components (QF/Node, QF/Interconnect and QF/Director). As expected, my speculations weren’t too far off – if anything, Juniper didn’t go far enough along those lines, but we’ll get there later.
The generic hardware architecture of the QFabric switching complex has been well known for quite a while (listening to the Juniper QFabric Packet Pushers Podcast is highly recommended) – here’s a brief summary:
Two weeks after VXLAN (backed by VMware, Cisco, Citrix and Red Hat) was launched at VMworld, Microsoft, Intel, HP & Dell published NVGRE draft (Arista and Broadcom are cleverly sitting on both chairs) which solves the same problem in a slightly different way.
If you’re still wondering why we need VXLAN and NVGRE, read my VXLAN post (and the one describing how VXLAN, OTV and LISP fit together), register for the Introduction to Virtual Networking webinar or read the Introduction section of the NVGRE draft.
The Mixed Feelings award of the week goes to Doug Gourlay and his Why FCoE is Dead, But Not Buried Yet article. While I agree with everything he’s saying about L2 and L3, the FCoE part of the post is shaky enough to generate tons of comments (or maybe that was the goal). For a hilarious perspective on the same topic, read Fiber Channel and Ethernet – the odd couple.
And here are the other great articles I stumbled upon during the last few days:
Chris sent me the following question a while ago:
I've got a full Internet BGP table, and want to responsibly send a default route to a downstream AS. It's the "responsibly" part that's got me frustrated: How can I judge whether the internet is working and make the origination of the default conditional on that?
He’d already figured out the neighbor default-originate route-map command, but wanted to check for more generic conditions than the presence of one or more prefixes in the IP routing table.
Daniel left a very relevant comment to my convoluted BGP session shutdown solution:
What I am currently doing is using EEM to watch my tracked objects and then issuing a neighbor shutdown command. Is there a functional reason I would not want to do it that way, and use the method you prescribe?
As always, the answer is “it depends.” In this case, the question to ask yourself is: “do I track configuration changes and react to them?”
One of my readers sent me the following question a few days ago:
Do you have a webinar that covers Dual DMVPN HUB deployment using OSPF? If so which webinar covers it?
I told him that the DMVPN: From Basics to Scalable Networks webinar covers exactly that scenario (and numerous others), describing both Phase 1 DMVPN and Phase 2 DMVPN design and implementation guidelines. Interestingly, he replied that the information on this topic seems to be very scant:
I read two great blog posts on Sunday: evergreen Fallacies of Distributed Computing from Bob Plankers and forward-looking Understanding Hadoop Clusters and the Network from Brad Hedlund. Read them both before continuing (they are both great reads) and try to figure out why I’m mentioning them in the same sentence (no, it’s not the fact that Hadoop uses distributed computing).
HP has recently commissioned an IRF network test that came to absolutely astonishing conclusions: vMotion runs almost twice as fast across two links bundled in a port channel than across a single link (with the other one being blocked by STP). The test report contains one other gem, this one a result of incredible creativity of HP marketing:
For disaster recovery, switches within an IRF domain can be deployed across multiple data centers. According to HP, a single IRF domain can link switches up to 70 kilometers (43.5 miles) apart.
You know my opinions about stretched cluster ... and the more down-to-earth part of HP Networking (the people writing the documentation) agrees with me.
In responses to my The Road to Complex Designs is Paved With Great Recipes post Daniel suggested shutting down EBGP session if your BGP router cannot reach the DMZ firewall and Cristoph guessed that it might be done without changing the router configuration with the neighbor fall-over route-map BGP configuration command. He was sort-of right, but the solution is slightly more convoluted than he imagined.
When I started writing about VXLAN, I received a few tweets along the lines of “I have no clue what you’re writing about.” Here’s a chance to fix that: I’ll run an Introduction to Virtualized Networking webinar in early October (register), trying to demystify the acronyms and marketectures. It doesn’t assume you know anything about server virtualization or IaaS; we’ll start from scratch and cover as much ground as possible.
During my visit to South Africa someone told me that he got 6VPE working over an L2TP connection ... and that you should “use the other VRF attribute, not lcp:interface-config” to make it work. A few days ago one of the readers asked me the same question and although I was able to find several relevant documents, I wanted to see it working in my lab.
If you’re not working for a data center fabric vendor (in which case please read the other today’s post), you’ll probably enjoy the excellent analogy Ethan Banks made after reading my TRILL-over-WAN post:
Think of a network topology like a road map. There's boulevards, major junction points, highways, dead ends, etc. Now imagine what that map looks like after it's been nuked from orbit: flat. Sure, we blew up the world, but you can go in a straight line anywhere you want.
... and don’t forget to be nice to the people asking for inter-DC VM mobility ;)
So far my presentation covers Cisco’s Fabric Path, VPC, VSS and port extenders, Brocade’s VCS Fabric based on what’s available in Brocade NOS 2.0 (they still have to decide whether they’ll tell me what’s new in NOS 2.1), Juniper’s Virtual Chassis and XRE, HP’s IRF, and OpenFlow.
A while ago someone sent me the following comment as part of a lengthy discussion focusing on Nexus 1000V: “My SE tells me that the latest 1000V release has rewritten the LACP code so that it operates entirely within the VEM. VSM will be out of the picture for LACP negotiations. I guess there have been problems.”
If you’re not familiar with the Nexus 1000V architecture, read this post first. If you’re not convinced you should be running LACP between the ESX hosts and the physical switches, read this one (and this one). Ready? Let’s go.
Remember how I foretold when TRILL first appeared that someone would be “brave” enough to reinvent WAN bridging and brouters that we so loved to hate in the early 90’s? The new wave of the WAN bridging craze has started: RFC 6361 defines TRILL over PPP (because bridging-over-PPP is just not good enough). Just because you can doesn’t mean you should.
Immediately after VXLAN was announced @ VMworld, the twittersphere erupted in speculations and questions, many of them focusing on how VXLAN relates to OTV and LISP, and why we might need a new encapsulation method.
VXLAN, OTV and LISP are point solutions targeting different markets. VXLAN is an IaaS infrastructure solution, OTV is an enterprise L2 DCI solution and LISP is ... whatever you want it to be.
Some of my African readers couldn’t buy the yearly webinar subscription in the past due to variety of credit card-related problems. I spent a few days with NIL Africa’s wonderful team after a fun consulting engagement that brought me to South Africa ... and managed to find a solution – get in touch with Lefkia Swart @ NIL Africa and she’ll send you an invoice.
In one of my vCloud Director Networking Infrastructure rants I wrote “if they had decided to use IP encapsulation, I would have applauded.” It’s time to applaud: Cisco has just demonstrated Nexus 1000V supporting MAC-over-IP encapsulation for vCloud Director isolated networks at VMworld, solving at least some of the scalability problems MAC-in-MAC encapsulation has.
Nexus 1000V VEM will be able to (once the new release becomes available) encapsulate MAC frames generated by virtual machines residing in isolated segments into UDP packets exchanged between VEMs.
When I (somewhat jokingly) wrote about the dense- and sparse-mode FCoE, I had no idea someone would try to extend the analogy to all possible FCoE topologies like Tony Bourke did. Anyhow, now that the cat is out of the bag, let’s state the obvious: enumerating all possible FCoE topologies is like trying to list all possible combinations of NAT, IP routing over at least two L2 technologies, and bridging; while it can be done, the best one can reasonably hope for is a list of supported topologies from various vendors.
However, it might make sense to give you a series of questions to ask the vendors offering FCoE gear to help you classify what their devices actually do.
Following my IBGP or EBGP in an enterprise network post a few people have asked for a more graphical explanation of IBGP/EBGP differences. Apart from the obvious ones (AS path does not change inside an AS) and more arcane ones (local preference is only propagated on IBGP sessions, MED of an EBGP route is not propagated to other EBGP neighbors), the most important difference between IBGP and EBGP is BGP next hop processing.
Most persuasive argument of the week: “Are traffic charges needed to avert a coming capex catastrophe?” by Robert Kenny. This is how you rebut the claims of the greedy Service Providers (and their hired guns), not by hysterical screaming and spitting perfected by some net neutrality zealots.
Most insightful talk: An Attempt to Motivate and Clarify Software-Defined Networking by Scott Shenker. While he’s handwaving across a lot of details, the framework does make sense.
I’ve created a jumbo pack for those that would like to get all my data center-related webinars. The Data Center Trilogy includes Data Center 3.0 for Networking Engineers, Data Center Interconnects and VMware Networking Deep Dive for only $99.99 (you save almost $40).
SK left a long comment to my More OSPF-over-DMVPN Questions post describing a scenario I find quite often in enterprise networks:
- Primary connectivity is provided by an MPLS/VPN service provider;
- Backup connectivity should use DMVPN;
- OSPF is used as the routing protocol;
- MPLS/VPN provider advertises inter-site routes as external OSPF routes, making it hard to properly design the backup connectivity.
If you’re familiar with the way MPLS/VPN handles OSPF-in-VRF, you’re probably already asking the question “how could the inter-site OSPF routes ever appear as E1/E2 routes?”
I got the following question from one of my readers:
I recently started working at a very large enterprise and learnt that the network uses BGP internally. Running IBGP internally is not that unexpected, but after some further inquiry it seems that we are running EBGP internally. I must admit I'm a little surprised about the use of EBGP internally and I wanted to know your thoughts on it.
Although they are part of the same protocol, IBGP and EBGP solve two completely different problems; both of them can be used very successfully in a large enterprise network.
In the next few days, I'll write about some of the interesting topics we’ve been discussing during the last week’s fantastic on-site workshop with Ian Castleman and his team. To get us started, here’s a short video describing BGP/IGP network design principles. It’s taken straight from my Building IPv6 Service Provider Core webinar (recording), but the principles apply equally well to large enterprise networks.
Following a series of soft switching articles written by Nicira engineers (hint: they are using a similar approach as Juniper’s QFabric marketing team), Greg Ferro wrote a scathing Soft Switching Fails at Scale reply. While I agree with many of his arguments, the sad truth is that with the current state of server infrastructure virtualization we need soft switching regardless of the hardware vendors’ claims about the benefits of 802.1Qbg (EVB/VEPA), 802.1Qbh (port extenders) or VM-FEX.
I’ve spent the last few days with a fantastic group of highly skilled networking engineers (can’t share the details, but you know who you are) discussing the topics I like most: BGP, MPLS, MPLS Traffic Engineering and IPv6 in Service Provider environment.
One of the problems we were trying to solve was a clean split of a POP into two sites, retaining redundancy without adding too much extra equipment. The strive for maximum redundancy nudged me to propose the unimaginable: layer-2 interconnect between four tightly controlled routers running BGP, but even that got shot down with a memorable quote from the senior network architect:
Warning: totally shameless plug ahead. You might want to stop reading right now.
Every now and then one of the engineers listening to my webinars shares a nice success story with me. One of them wrote:
I'm doing a DMVPN deployment and Cisco design docs just don’t cover dual ISPs for the spokes hence I thought I give your webinars/configs a try.
... and a bit later (after going through the configs that you get with the DMVPN webinar):
Reading Cisco’s marketing materials, VM-FEX (the feature probably known as VN-Link before someone went on a FEX-branding spree) seems like a fantastic idea: VMs running in an ESX host are connected directly to virtual physical NICs offered by the Palo adapter and then through point-to-point virtual links to the upstream switch where you can deploy all sorts of features the virtual switch embedded in the ESX host still cannot do. As you might imagine, the reality behind the scenes is more complex.
The flooding attacks (or mishaps) on large layer-2 networks are well known and there are ample means to protect the network against them, for example storm control available on Cisco’s switches. Now imagine you change the source MAC address of every packet sent to a perfectly valid unicast destination.
A while ago someone asked me to help him troubleshoot his Internet connectivity. He was experiencing totally weird symptoms that turned out to be a mix of MTU problems, asymmetric routing (probably combined with RPF checks on ISP side) and non-routable PE-CE subnets. While trying to figure out what might be wrong from the router configurations, I was surprised by the amount of complexity he’d managed to introduce into his DMZ design by following recipes and best practices we all dole out in blog posts, textbooks and training materials.
My Inbox is overflowing (yet again); here are some great links from last week:
Data centers and summer clouds
Matthew Norwood describes an interesting product from HP – you could probably build a small data center with a single blade enclosure.
After weeks of waiting, perfect summer weather finally arrived ... and it’s awfully hard to write blog posts that make marginal sense when being dead-tired from day-long mountain biking, so I’ll just recap the conversation I had with Brian a few days ago. He asked “How would I set up a (dual) hub running OSPF with phase 1 spokes and prevent all spoke routes from being seen at other spokes? Think service provider environment.”
Building large-scale VLANs to support IaaS services is every data center designer’s nightmare and the low number of VLANs supported by some data center gear is not helping anyone. However, as Anonymous Coward pointed out in a comment to my Building a Greenfield Data Center post, service providers have been building very large (and somewhat stable) layer-2 transport networks for years. It does seem like someone is trying to reinvent the wheel (and/or sell us more gear).
I’ve already written about the stupidities of risking the stability of two data centers to enable live migration of “mission critical” VMs between them. Now let’s take the discussion a step further – after hearing how critical the VM the server or application team wants to migrate is, you might be tempted to ask “and how do you ensure its high availability the rest of the time?” The response will likely be along the lines of “We’re using VMware High Availability” or even prouder “We’re using VMware Fault Tolerance to ensure even a hardware failure can’t bring it down.”
Accumulated in my Inbox during the second half of July:
Duncan Epping wrote a long series of posts describing the new VMware’s High Availability implementation: Fault domain manager, Primary nodes, Datastore heartbeating, Restarting VMs and finally Advanced settings.
Chris sent me an interesting question:
Imagine L2 traffic between two VMs on different ESX hosts, both using Nexus 1000V. Will the physical switches see the traffic with source and destination MACs matching the VM's vNICs or traffic on NX1000V "packet" VLAN between VEMs (in this case, the packet VLAN would act as a virtual backplane)?
To decode the acronyms in the question, please read my What exactly is a Nexus 1000V post.
I had a really hard time staying mum about an exciting project we were working on for the last year. Now it’s official: Vodacom Business in South Africa has launched Office in the Cloud using our field-proven IT-as-a-service solution. Huge congratulations to everyone involved!
It seems that most networking vendors consider the Flat Earth architectures the new bonanza ... everyone is running to join the gold rush, from Cisco’s FabricPath and Brocade’s VCS to HP’s IRF and Juniper’s upcoming QFabric. As always, the standardization bodies are following the industry with a large buffet of standards to choose from: TRILL, 802.1ag (SPB), 802.1Qbg (EVB) and 802.1bh (Port extenders).
The following design challenge landed in my Inbox not too long ago:
My organization is the in the process of building a completely new data center from the ground up (new hardware, software, protocols ...). We will currently start with one site but may move to two for DR purposes. What DC technologies should we be looking at implementing to build a stable infrastructure that will scale and support technologies you feel will play a big role in the future?
In an ideal world, my answer would begin with “Start with the applications.”
SearchNetworking has just published my article describing the issues you’ll face when deploying virtualized firewalls (you might want to read the one describing benefits and drawbacks of virtual appliances first). The article focuses primarily on the VMsafe Network API (aka dvFilter) and VMware’s vShield; you’ll find more in-depth information on alternate solutions (including HP’s and Juniper’s products using dvFilter API and Cisco’s vPath API) in my VMware Networking Deep Dive webinar (register here or buy a recording).
I got an interesting question after writing the Asymmetric MPLS MTU Problem post: “Why does PHP happen only on directly-connected interfaces but not on other non-MPLS routes?” Obviously it’s time for a deep dive into Penultimate Hop Popping (PHP) mysteries (warning label: read the MPLS books if you plan to get seriously involved with MPLS).
Russell Heilling made a highly interesting observation in a comment to my MPLS MTU challenges post: you could get asymmetric MTUs in MPLS networks due to penultimate hop popping. Imagine our network has the following topology (drawn with the fantastic tools used by the RFC authors):
I was sort of upset that my vacations were making me miss the VMware vSphere 5.0 launch event (on the other hand, being limited to half hour Internet access served with early morning cappuccino is not necessarily a bad thing), but after I managed to get home, I realized I hadn’t really missed much. Let me rephrase that – VMware launched a major release of vSphere and the networking features are barely worth mentioning (or maybe they’ll launch them when the vTax brouhaha subsides).
After the bumpy start of our holidays, we thoroughly enjoyed the crystal-clear waters, hot sunny weather and the hospitality of inhabitants of Croatian island Brač ... until my daughter came to me quietly asking “hey, I don’t want to raise panic, but my friend saw a weird cloud ... would you mind checking if it’s a forest fire” A short walk to a vantage point confirmed the initial observation – we were facing what turned out to be the worst forest fire in more than a decade. Obviously I was bound to receive another hefty dose of disaster recovery lessons.
My recent vacation included a few perfect lessons in disaster recovery. Fortunately the disasters were handled by total pros that managed them perfectly. It all started when we were already packed and driving – my travel agent called me to tell me someone mixed up the dates and shifted them by two months; we were expected to arrive in late August. Not good when you have small kids all excited about going to the seaside sitting in the car.
Lots of interesting articles accumulated in my Inbox while I tried to figure out what one could possibly do when being stranded in an easy chair next to the sea with no Internet access. By far the best article that I stumbled upon in my Twitter feed is a 10-year-old IS-IS versus OSPF presentation by the legendary Dave Katz (thank you @yelfathi).
@MCL_Nicolas sent me the following tweet: “Finished @packetpushers Podcast show 7 with @ioshints ... I Want to learn more about Mpls+Mtu problem” You probably know I simply have to mention that a great MPLS/VPN book and a fantastic webinar describe numerous MPLS/VPN-related challenges and solutions (including MTU issues), but if MTU-related problems are the only thing standing between you and an awesome MPLS/VPN network, here are the details.
A comment left on my dense-mode FCoE post is a perfect example of the dangers of using vague, marketing-driven and ill-defined word like “switching”. The author wrote: “FC-SW is by no means routing ... Fibre Channel is switching.” As I explained in one of my previous posts, switching can mean anything, from circuit-based activities to bridging, routing and even load balancing (I am positive some vendors claim their load balancers ... oops, application delivery controllers ... are L4-L7 switches), so let’s see whether Fibre Channel “switching” is closer to bridging or routing.
Yandy sent me an interesting question:
Is it just me or do you also see the Nexus 2000 series not having any type of distributed forwarding as a major design flaw? Cisco keeps throwing in the “it's a line-card” line, but any dumb modular switch nowadays has distributed forwarding in all its line cards.
I’m at least as annoyed as Yandy is by the lack of distributed switching in the Nexus port (oops, fabric) extender product range, but let’s focus on a different question: does it matter?
Michael modified one of my EEM applets to monitor CRC errors on WAN interfaces and notify the operator (via e-mail) when an interface has more than two errors per minute. He wanted to monitor multiple interfaces and asked me whether it’s possible to modify the SNMP event detector somehow. I only had to point him to the event correlation feature of EEM version 2.4 and he sent me the following (tested) applet a few days later.
Chris Marget sent me the following interesting observation:
One of the things we learned back at the beginning of Ethernet is no longer true: hardware filtering of incoming Ethernet frames by the NICs in Ethernet hosts is gone. VMware runs its NICs in promiscuous mode. The fact that this Networking 101 level detail is no longer true kind of blows my mind.
So what exactly is going on and does it matter?
Matthew sent me the following remarkable fact (and he just might have saved some of you a few interesting troubleshooting moments):
I was bringing up an OSPF adjacency between a Catalyst 6500 and an ASR 9006 and kept getting an MTU mismatch error. The MTU was set exactly the same on both sides. So I reset them both back to default (1500 on the 6500 and 1514 on the ASR 9006) and the adjacency came back up, even though now the MTU is off by 14 bytes. So I attempted to bump the MTU up again, this time setting the MTU on 6500 to 1540 and the MTU on the ASR 9006 to 1554. Adjacency came right up. Is there something I am missing?
The 14 byte difference is the crucial point – that’s exactly the L2 header size (12 bytes for two 6-byte MAC addresses and 2 bytes for ethertype). When you specify MTU size on the IOS classic (either with the ip mtu command or with the mtu command), you specify the maximum size of the layer-3 payload without the layer-2 header. Obviously IOS XR works differently – there you have to specify the maximum size of a layer-2 frame, not of its layer-3 payload (comments describing how other platforms behave are most welcome!).
My web site statistics are (yet again) confirming the inevitable truth: the holiday season has started in the northern hemisphere. I hope you’ll be busy doing things that are more fun than reading my blog, so I’ll publish only two or three articles per week to prevent information overload, returning to the regular daily schedule in late August.
I had to check the Microsoft clustering terminology a few days ago, so I used Google to find the most relevant pages for “Windows cluster” and landed on the Failover clustering home page where the Multisite Clustering link immediately caught my attention. Dreading the humongous amount of layer-2 DCI stupidities that could lurk hidden behind such a concept, I barely dared to click on the link ... which unveiled one of the most pleasant surprises I’ve got from an IT vendor in a very long time. Microsoft actually understands that some people prefer to keep their IT infrastructure stable and supported multi-subnet clusters for quite some time. What a revolutionary concept for the L2-crazed flat-earth world some other vendors are busy promoting.
With the latest software release (12.3.01) the ServerIron ADX, Brocade’s load balancer product, supports the real NAT64 (not 6-to-4 load balancing). Even more, it supports all of the features I would like to see in a NAT64 box plus a few more:
True NAT64 support, mapping the whole IPv4 address space into an IPv6 prefix that can be reached by IPv6 clients. One would truly hope the implementation is conformant with RFC 6146, but the RFC is not mentioned in the documentation and I had no means of checking the actual behavior. DNS64 is not included, but that’s not a major omission as BIND 9.8.0 supports it.
Every time I write about lack of commercial NAT64 products (yeah, I know Juniper had one for a long time and Brocade just rolled out ADX code), someone tells me that company X has field-proven NAT64 product ... only most of them are really 6-to-4 load balancers. Let’s see what the difference is.
J Michel Metz brought out an interesting aspect of the dense/sparse mode FCoE design dilemma in a comment to my FCoE over Trill ... this time from Juniper post: FC-focused troubleshooting. I have to mention that he happens to be working for a company that has the only dense-mode FCoE solution, but the comment does stand on its own.
Before reading this post you might want to read the definition of dense- and sparse-mode FCoE and a few more technical details.
Martin Casado and his team have published a great series of blog articles describing hypervisor switching (for the VMware-focused details, check out my VMware Networking Deep Dive). It starts with an overview of Open vSwitch (the open source alternative for VMware’s vSwitch, commonly used in Xen/KVM environments), describes the basics of hypervisor-based switching and addresses some of the performance myths. There’s also an interesting response from Intel setting straight the SR-IOV facts.
It’s a foggy rainy day in my part of the world and most of Europe is enjoying a very long weekend ... a perfect day to straighten out some of the long-neglected paperwork issues. Finally I scraped together enough willpower to complete the list of my articles published over the last few years.
In early autumn of 2010, a “DRAFT on Cisco Nexus 1000V LISP Configuration Guide” appeared on CCO. It’s gone now (and unfortunately I haven’t saved a copy), but the possibilities made me really excited – with LISP in Nexus 1000V, we could do close-to-perfect vMotion over any IP infrastructure (including inter-DC vMotion that requires stretched VLANs and L2 DCI today). Here’s what I had to say on this topic during my Data Center Interconnect webinar (buy a recording).
You probably know the old saying – if the mountain doesn’t want to come to you, you have to go out there and climb it. vCider, a brand-new startup launching their product at Gigaom Structure Launchpad, decided to do something similar in the server virtualization (Infrastructure-as-a-Service; IaaS) space – its software allows IaaS customers to build their own virtual layer-2 networks (let’s call then vSubnets) on top of IaaS provider’s IP infrastructure; you can even build a vSubnets between VMs running within your enterprise network (private cloud in the cloudy lingo) and those running within Amazon EC2 or Rackspace.
Full disclosure: Chris Marino from vCider got in touch with me in early June. I found the idea interesting, he helped me understand their product (even offered a test run, but I chose to trust the technical information available on their web site and passed to me in e-mails and phone calls), and I decided to write about it. That’s it.
I got a really interesting question from one of my readers (slightly paraphrased):
Is this a correct statement: QoS on a WAN router will always be on if there are packets on the wire as the line is either 100% utilized or otherwise nothing is being transmitted. Comments like “QoS will kick in when there is congestion, but there is always congestion if the link is 100% utilized on a per moment basis” are confusing.
Well, QoS is more than just queuing. First you have to classify the packets; then you can perform any combination of marking, policing, shaping, queuing and dropping.
2011-06-23: Added description of various link efficiency mechanisms.
One of the implications of Virtual Machine (VM) mobility (as implemented by VMware’s vMotion or Microsoft’s Live Migration) is the need to have the same VLAN configured on the access ports connected to the source and the target hypervisor hosts. EVB (802.1Qbg) provides a perfect solution, but it’s questionable when it will leave the dreamland domain. In the meantime, most environments have to deploy stretched VLANs ... or you might be able to use hypervisor-aware features of your edge switches, for example VM Tracer implemented in Arista EOS.
Two vSwitch portgroup-related questions:
- Can you configure the same VLAN on two portgroups in the same vSwitch? How about vDS?
- Can VMs attached to two different portgroups in the same ESX host talk to each other directly or do they have to go communicate through an external switch (or L3 device)?
Got your answers? Now click the Read more ... link.
I got a question along these lines from a friend working in SP environment:
Customer wants to upgrade a 7200 with PA-A3-OC3SMI to ASR1001. Can they use ASR1001-2XOC3POS interfaces or are those different from “normal ATM interfaces”?
Both interfaces (PA-A3-OC3SMI for the 7200 and 2XOC3POS for the ASR1001) use SONET framing on layer 1, so you can connect them to the same SONET (layer-1) gear.
New DMVPN features in IOS release 15.x is obviously a topic without a broad audience ... although Cisco did introduce some nifty new things that can help you scale a large DMVPN network or make a DMVPN network more manageable.
A tweet from J Michel Metz has alerted me to a “Why TRILL won't work for data center network architecture” article by Anjan Venkatramani, Juniper’s VP of Product Management. Most of the long article could be condensed in two short sentences my readers are very familiar about: Bridging does not scale and TRILL does not solve the traffic trombone issues (hidden implication: QFabric will solve all your problems)... but the author couldn’t resist throwing “FCoE over TRILL” bone into the mix.
Got this question a few days ago:
I have a large DMVPN network (~ 1000 sites) using variety of DSL, cable modem, and wireless connections. In all of these cases the bandwidth is extremely dissimilar and even varies with time. How can I handle this in a scalable way? Also, do you know of any product or facility that I can use to better measure the bandwidth from hub to spoke and better set the QOS values?
The last question is the easy part: one of the products that does that is NIL Monitor service where the remote probes can measure the actual end-to-end bandwidth. NIL Monitor software can also log into routers and change configurations if needed ... but what should you change?