Worth Reading: The Lost Designer
Scott Berkun published another interesting article: The Lost Designer. As always, replace designer with networking engineer and enjoy.
Lessons Learned: Technology Still Matters
In June 2020, a friend asked me to do a short presentation on lessons learned during my 35 years as a networking engineer. It went reasonably well, so I decided to turn it into a webinar, starting with regardless of what the disruptive marketers tell you, technology still matters.
… updated on Monday, July 12, 2021 18:00 UTC
Unnumbered Ethernet Interfaces, DHCP Edition
Last week we explored the basics of unnumbered IPv4 Ethernet interfaces, and how you could use them to save IPv4 address space in routed access networks. I also mentioned that you could simplify the head-end router configuration if you’re using DHCP instead of per-host static routes.
Obviously you’d need a smart DHCP server/relay implementation to make this work. Simplistic local DHCP server would allocate an IP address to a client requesting one, send a response and move on. Likewise, a DHCP relay would forward a DHCP request to a remote DHCP server (adding enough information to allow the DHCP server to select the desired DHCP pool) and forward its response to the client.
Real-Life Network-as-a-Graph Examples
After reading the Everything Is a Graph blog post, Vadim Semenov sent me a long list of real-life examples (slightly edited):
I work in a big enterprise and in order to understand a real packet path across multiple offices via routers and firewalls (when mtr or traceroute don’t work – they do not show firewalls), I made OSPF network visualization based on LSDB output. The idea is quite simple – save information about LSA1 and LSA2 (LSA5 optionally) and that will be enough in order to build a graph (use show ip ospf database router/network on Cisco devices).
Unequal-Cost Multipath with BGP DMZ Link Bandwidth
In the previous blog post in this series, I described why it’s (almost) impossible to implement unequal-cost multipathing for anycast services (multiple servers advertising the same IP address or range) with OSPF. Now let’s see how easy it is to solve the same challenge with BGP DMZ Link Bandwidth attribute.
I didn’t want to listen to the fan noise generated by my measly Intel NUC when simulating a full leaf-and-spine fabric, so I decided to implement a slightly smaller network:
Feedback: Azure Networking
When I started developing AWS- and Azure Networking webinars, I wondered whether they would make sense – after all, you can easily find tons of training offerings focused on public cloud services.
However, it looks like most of those materials focus on developers (no wonder – they are the most significant audience), with little thought being given to the needs of network engineers… at least according to the feedback left by one of ipSpace.net subscribers.
Worth Reading: The Neuroscience of Busyness
In the Neuroscience of Busyness article, Cal Newport describes an interesting phenomenon: when solving problems, we tend to add components instead of removing them.
If that doesn’t describe a typical network (or protocol) design, I don’t know what does. At least now we have a scientific basis to justify our behavior ;)
Worth Reading: Switching to IP fabrics
Namex, an Italian IXP, decided to replace their existing peering fabric with a fully automated leaf-and-spine fabric using VXLAN and EVPN running on Cumulus Linux.
They documented the design, deployment process, and automation scripts they developed in an extensive blog post that’s well worth reading. Enjoy ;)
Video: Cisco SD-WAN Policy Design
In the final video in his Cisco SD-WAN webinar, David Penaloza discusses site ID assignments and policy processing order.
A carefully planned site scheme and ordered list of policy entries will save you complications and headaches when deploying the SD-WAN solution.
Routing Protocols: Use the Best Tool for the Job
When I wrote about my sample OSPF+BGP hands-on lab on LinkedIn, someone couldn’t resist asking:
I’m still wondering why people use two routing protocols and do not have clean redistribution points or tunnels.
Ignoring for the moment the fact that he missed the point of the blog post (completely), the idea of “using tunnels or redistribution points instead of two routing protocols” hints at the potential applicability of RFC 1925 rule 4.
Unnumbered Ethernet Interfaces
Imagine an Internet Service Provider offering Ethernet-based Internet access (aka everyone using fiber access, excluding people believing in Russian dolls). If they know how to spell security, they might be nervous about connecting numerous customers to the same multi-access network, but it seems they have only two ways to solve this challenge:
- Use private VLANs with proxy ARP on the head-end router, forcing the customer-to-customer traffic to pass through layer-3 forwarding on the head-end router.
- Use a separate routed interface with each customer, wasting three-quarters of their available IPv4 address space.
Is there a third option? Can’t we pretend Ethernet works in almost the same way as dialup and use unnumbered IPv4 interfaces?
Single-Metric Unequal-Cost Multipathing Is Hard
A while ago, we discussed whether unequal-cost multipathing (UCMP) makes sense (TL&DR: rarely), and whether we could implement it in link-state routing protocols (TL&DR: yes). Even though we could modify OSPF or IS-IS to support UCMP, and Cisco IOS XR even implemented those changes (they are not exactly widely used), the results are… suboptimal.
Imagine a simple network with four nodes, three equal-bandwidth links, and a link that has half the bandwidth of the other three:
 
netsim-tools release 0.7: Cumulus VX, EIGRP, and BGP IPv6 AF
netsim-tools release 0.7 is published, bringing you the following goodies (including stuff published a week ago as release 0.6.3):
- Cumulus VX support on libvirt and virtualbox.
- EIGRP configuration module
- BGP IPv6 address family
- Controlled BGP community propagation
Other changes include:
Worth Reading: Azure Datacenter Switch Failures
Microsoft engineers published an analysis of switch failures in 130 Azure regions (review of the article, The Next Platform summary):
- A data center switch has a 2% chance of failing in 3 months (= less than 10% per year);
- ~60% of the failures are caused by hardware faults or power failures, another 17% are software bugs;
- 50% of failures lasted less than 6 minutes (obviously crashes or power glitches followed by a reboot).
- Switches running SONiC had lower failure rate than switches running vendor NOS on the same hardware. Looks like bloatware results in more bugs, and taking months to fix bugs results in more crashes. Who would have thought…
Worth Reading: Running BGP in Large-Scale Data Centers
Here’s one of the major differences between Facebook and Google: one of them publishes research papers with helpful and actionable information, the other uses publications as recruitment drive full of we’re so awesome but you have to trust us – we’re not sharing the crucial details.
Recent data point: Facebook published an interesting paper describing their data center BGP design. Absolutely worth reading.
Just in case you haven’t realized: Petr Lapukhov of the RFC 7938 fame moved from Microsoft to Facebook a few years ago. Coincidence? I think not.