Blog Posts in July 2010
Almost 30 years ago, I was lucky enough to work on one of the best systems of those days, VAX/VMS (BTW, it was able to run 30 interactive users in 2 MB of main memory), which had everything we’d wished for – it was truly interactive with hierarchical file system and file versioning (not to mention remote file access and distributed clusters). I couldn’t possible understand the woes of IBM mainframe programmers who had to deal with virtualized 132-column printers and 80-column card readers (ironically running in virtual machines that the rest of the world got some 20 years later). When I wanted to compile my program, I started the compiler; when they wanted to do the same, they had to edit a batch job, submit the batch job (assuming the disk libraries were already created), poll the queues to see when it completed and then open the editor to view the 132-column printout of compiler errors.
After a long discussion, I started to understand the problem: the whole system was burdened with so many legacy decisions that still had to be supported that there was nothing one could do to radically change it (yeah, it’s hard to explain that to a 20-year old kid full of himself).
Stretch (@packetlife) shared an interesting link in a comment to my P2P traffic is bad for the network post: Facebook and Twitter use BitTorrent to distribute software updates across hundreds (or thousands) of servers ... another proof that no technology is good or bad by itself (Greg Ferro might have a different opinion about FCoE).
Shortly after I’ve tweeted about @packetlife’s link, @sevanjaniyan replied with an even better link to a presentation by Larry Gadea (infrastructure engineer @ Twitter) in which Larry describes Murder, Twitter’s implementation of software distribution on top of BitTornado library.
If you have a data center running large number of servers that have to be updated simultaneously, you should definitely watch the whole presentation; here’s a short spoiler for everyone else:
As expected, my P2P traffic is bad for the network post generated lots of comments; from earning me another wonderful title (shill for Internet monopolies) that I’ll proudly add to my previous awards to numerous technical comments and even a link to a very creative use of BitTorrent to solve software distribution problems (thanks again, @packetlife).
Most of the commentators missed the main point of my post and somehow assumed that since I don’t wholeheartedly embrace P2P traffic I want to ban it from the Internet. Far from it, what I was trying to get across was a very simple message:
I’m positive you all know that. I also hope that you’re making sure it’s not hogging your enterprise network. Service Providers are not so fortunate – some Internet users claim using unlimited amounts of P2P traffic is their birthright. I don’t really care what kind of content these users transfer, they are consuming enormous amounts of network resources due to a combination of P2P client behavior (which is clearly “optimized” to grab as much as possible) and the default TCP/QoS interaction.
One of the biggest hurdles Internet Service Providers will face in the near future is access to legacy IPv4 content once we run out of globally routable IPv4 addresses. Although it’s easy to offer your content over IPv6 (assuming you have a properly designed network using load balancers from a company that understands the need for IPv6 in Data Center), a lot of the “long tail” content will remain reachable only over IPv4.
A while ago I’ve published a presentation I’d delivered at the Slovenian IPv6 summit; a few days ago SearchTelecom.com has published my article describing various transition solutions in more details. In the first part, “IPv4 address exhaustion: Making the IPv6 transition work”, I’m describing the grim facts we’re facing and the NAT-PT fiasco. In the second part, “Comparing IPv6 to IPv4 address translation solutions”, you’ll find brief descriptions of LSN (also known as CGN – Carrier-Grade NAT), NAT444, DS-Lite, A+P and NAT64.
Based on the readers' comments on my “Bridging and Routing: is there a difference?" post (thanks you!), here are a few more differences between bridging and routing:
Cost. Layer-2 switches are almost always cheaper than layer-3 (usually combined layer-2/3) switches. There are numerous reasons for the cost difference, including:
Peter John Hill made an interesting observation in a comment to my “The TRILLing brain split” post; he wrote “TRILL really is routing at layer 2.”
He’s partially right – TRILL uses a routing protocol (IS-IS) and the TRILL protocol used to forward Ethernet frames (TRILL data frames) definitely has all the attributes of a layer-3 protocol:
- TRILL data frames have layer-3 addresses (RBridge nickname);
- They have a hop count;
- Layer-2 next-hop is always the MAC address of the next-hop RBridge;
- As the TRILL data frames are propagated between RBridges, the outer MAC header changes.
There’s an extremely good reason Brad Hedlund mentioned server virtualization in his career advice: it has fundamentally changed the Data Center networking.
Years ago, we’ve treated servers as oversized IP hosts. From the networking perspective, they were no different from other IP hosts. Some of them had weird clustering requirements, some of them had multiple uplinks that had to be managed somehow, but those were just minor details. Server virtualization is a completely different beast.
Last Saturday I wrote “I’ll write only a few posts per week and try to keep the reading light and not too technical” ... obviously another broken promise. I can only hope you’ve at least enjoyed my bridging-and-routing rants.
In his comment to one of my TRILL posts, Petr Lapukhov has asked the fundamental question: “how is bridging different from routing?”. It’s impossible to give a concise answer (let alone something as succinct as 42) as the various kludges and workarounds (including bridges and their IBM variants) have totally muddied the waters. However, let’s be pragmatic and compare Ethernet bridging with IP (or CLNS) routing. Throughout this article, bridging refers to transparent bridging as defined by the IEEE 802.1 series of standards.
Design scope. IP was designed to support global packet switching network infrastructure. Ethernet bridging was designed to emulate a single shared cable. Various design decisions made in IP or Ethernet bridging were always skewed by these perspectives: scalability versus transparency.
During the last weeks I tried hard to sort out my thoughts on routing and bridging; specifically, what’s the difference between them and why you should use routing and not bridging in any large-scale network (regardless of whether it happens to be cramped into a single building called Data Center).
My vague understanding of layer 2 (Data Link layer) of the OSI model was simple: it was supposed to provide frame transport between neighbors (a neighbor is someone who is on the same physical medium as you are); layer 3 (Network layer) was supposed to provide forwarding between distant end nodes. Somehow the bridges did not fit this nice picture.
As I was struggling with this ethereally geeky version of a much older angels-on-a-pin problem, Greg Ferro of EtherealMind.com (what a coincidence, isn’t it) shared a link to a GoogleTalk given by Radia Perlman, the author of the Spanning Tree Protocol and co-author of TRILL. And guess what – in her opening minutes she said “Bridges don’t make sense. If you do packet forwarding, you should do it on layer 3”. That’s so good to hear; I’m not crazy after all.
On a tangential note, if you like my articles, please share them. The more you tweet or blog about them, the easier it will be for other networking engineers to find them. Thank you!
This week’s webinars were the last ones before the summer break. I definitely need one, the last weeks were crazy, but I’ve also learned a lot about DMVPN (the need to revisit “old truths” and figure out odd details is what makes preparing for the webinars real fun).
I’ve also noticed that some of have already started you summer vacations. Last week’s blog traffic was way below the usual levels (Cisco Live and Independence Day were only two of the reasons) and this week is still below the average. Obviously it’s time to shift to summer schedule – I’ll write only a few posts per week and try to keep the reading light and not too technical ... the kind of summer campfire stories you’d hear from the geekiest granduncle you could imagine.
Enjoy the summer (while it lasts), have a great time and try to visit some truly spectacular spots; the Dolomites are never a bad choice.
I got this question from the SearchTelecom Ask-the-Expert project ... and the engineer asking the question was probably looking for something short and concise. This is my attempt to explain the difference in a few paragraphs. Have I missed anything important? Could it be done better?
The split personality Cisco has exposed at Cisco Live 2010 is amazing: on one hand you have the Data Center team touting the benefits of Routing at Layer 2 (an oxymoron if I’ve ever seen one), on the other hand you have Russ White extolling the virtues of good layer-3 design in the CCDE training (the quote I like most: “It all meets at Layer 3 ... that’s why CCDE is layer-3 centric”). If you’re confused, you’re not the only one
Read more ... (this time @ etherealmind.com)
A while ago one of my readers wanted to perform an extended ping from an EEM applet. For whatever reason the extended ping syntax wasn’t good enough for him, so I told him to use the pattern parameter of the action cli command EEM applet statement.
A simplistic explanation of EIGRP offset-list configuration command you might see every now and then is “it adjusts the RD/FD to influence route selection”. If that would be the case, the adjustment would not be propagated to upstream routers (remember: only the EIGRP vector metric is sent in the routing updates, not RD or FD) resulting in potential routing loops (it’s never a good idea to use one set of metrics and propagate another set of metrics to your neighbors).
In reality, the EIGRP offset lists adjust the delay portion of the EIGRP vector metric (which linearly influences the RD/FD value). You can increase
or decrease the value of the delay metric for EIGRP updates received or sent through a specific interface (or all interfaces). You can also use an access list in the offset-list command, applying changes only to specific IP prefixes. For more details, please read this technology note on Cisco’s web site.
Last days I was eating, drinking, breathing and dreaming DMVPN as I was preparing lab scenarios for my DMVPN webinar(the participants will get complete router configurations for 12 different scenarios implemented in an 8-router fully redundant DMVPN network).
Some of the advanced scenarios were easy; for example, I’ve found a passing reference to passive RIPv2 with IP SLA in the DMVPN/GETVPN Design & Case Study presentation. I knew exactly what Stephen Lynn had in mind and was able to create a working scenario in minutes. Unfortunately, 2-tier hub site with IPSec offload was a completely different beast.
When Cisco’s white paper calls bridging Routing at Layer-2. I’ve come to expect gimmicks like this from startups trying to woo clueless customers, but Cisco had so far kept to a certain level of correctness at least in their technical documents. One can only wonder what’s next in the industry-wide drive to try to persuade us that square pegs can easily fit into round holes.
After my STP-is-like-hand-grenade tweet, several friends sent me links to All Systems Down: an epic STP fail. I am positive smaller STP failures happen on a daily basis, but are fixed too fast to be honored with an extensive case study. Nevertheless, vendors are furiously trying to persuade you that L2 switching (formerly known as bridging) is the sexiest thing since Paris Hilton, the last one being Cisco with its FabricPath announcement. Some of us will definitely enjoy the show ...
Another long-term grudge of mine got somewhat more fact-based: Nielsen study supposedly reported that 6% of mobile users cause 50% of the traffic. While this could cause some more people to believe that tiered data plans (or usage caps) might make sense, I am positive this issue will continue going the way global warming went years ago.
And, last but definitely not least, there’s another Packet Pushers podcast well worth listening to. Where else could someone start discussing new ASA 8.3 features only to realize a minute later that he’s praising the virtues of Cisco IOS Zone Based Firewalls? If only they would have remembered to mention my ZBF book ;)
I’ve stumbled across a really interesting BGP/IGP problem described by Jeremy Filliben that nicely illustrates the dangers of using more than one IGP in your network. You should read the original post for details, here’s a short summary:
- The same IP prefix is received by two BGP border routers (A and D) and sent to a third IBGP-only router (E).
- E can reach A via OSPF. It can reach D via EIGRP.
- E receives two BGP paths to the target IP prefix from A and D. They are identical, so the IGP metric (taken from the IP routing table) is used as the tie-breaker.
- EIGRP and OSPF metrics are totally incomparable and thus A (reachable via OSPF) is always preferred over D (reachable via EIGRP).
Lesson learned: use a single IGP in your AS (or at least in its BGP core).
It looks like the signed DNS root zone might finally get deployed on July 15th and Geoff Huston celebrates the fact with a lengthy article on DNSSEC. Just in case you’re not aware what DNSSEC is all about, he’s providing this nifty summary:
A succinct summary of the problem that DNSSEC is intended to address is that DNSSEC is intended to protect DNS clients from believing forged DNS data.