Blog Posts in October 2019
I don’t remember who pointed me to the excellent How Complex Systems Fail document. It’s almost like RFC1925 – I could quote it all day long, and anyone dealing with large mission-critical distributed systems (hint: networks) should read it once a day ;))
Years ago Dan Hughes wrote a great blog post explaining how expensive TCP is. His web site is long gone, but I managed to grab the blog post before it disappeared and he kindly allowed me to republish it.
If you ask a CIO which part of their infrastructure costs them the most, I’m sure they’ll mention power, cooling, server hardware, support costs, getting the right people and all the usual answers. I’d argue one the the biggest costs is TCP, or more accurately badly implemented TCP.
One of my subscribers was interested in trying out whitebox solutions. He wrote:
What open source/whitebox software/hardware should I look at if I wanted to build a leaf-and-spine VXLAN/EVPN/BGP data center.
I don’t think you can get a fully-open-source solution because the ASIC manufacturers hide their SDK behind a mountain of NDAs (that strategy must make perfect sense – after all, it generated such awesome PR for NVIDIA). Anyway, the closest you can get (AFAIK) if you're a mere mortal is Cumulus Linux, and you just choose any whitebox hardware off their Hardware Compatibility List.
Seth Godin published an interesting article on the value of hard work (and what hard work really is). Go and read it first, then we’ll translate it into networking terms.
Already back? Good, let’s go.
The first worker is a traditional networking technician (it wouldn’t be fair to call him an engineer) – he’s busy configuring VLANs, ACLs, firewall rules… the whole day.
Everyone is talking about FRRouting suite these days, while hidden somewhere in the background OpenBGPD has been making continuous progress for years. Interestingly, OpenBGPD project was started for the same reason FRR was forked - developers were unhappy with Zebra or Quagga routing suite and decided to fix it.
You probably heard me say “networking engineer encountering a public cloud feels like Alice in Wonderland” - packet forwarding works in a different way in every public cloud, subnets are a mix between routed interfaces and VRFs, you cannot change IP addresses without involving the orchestration system…
We covered the networking aspects of Amazon Web Services and Azure in our cloud webinars, but you might need a bigger picture:
After solving the BGP configuration challenge (could you imagine configuring BGP in a leaf-and-spine fabric with just a few commands in 2015), they did the same thing with EVPN configuration, where they decided to implement the simplest possible design (EBGP-only fabric running EBGP EVPN sessions on leaf-to-spine links), resulting in another round of configuration simplicity.
I got interesting feedback from one of my readers after publishing my REST API Is Not Transactional blog post:
One would think a transactional REST interface wouldn’t be too difficult to implement. Using HTTP1/1, it is possible to multiplex several REST calls into one connection to a specific server. The first call then is a request for start a transaction, returning a transaction ID, to be used in subsequent calls. Since we’re not primarily interested in the massive scalability of stateless REST calls, all the REST calls will be handled by the same frontend. Obviously the last call would be a commit.
I wouldn’t count on HTTP pipelining to keep all requests in one HTTP session (mixing too many layers in a stack never ends well) but we wouldn’t need it anyway the moment we’d have a transaction ID which would be identical to session ID (or session cookie) traditional web apps use.
It’s clear that we had two types of SD-WAN vendors:
A few months ago Johannes Weber sent me a short email saying “hey, I plan to write a few NTP posts” and I replied “well, ping me when you have something ready”.
In the meantime he wrote a veritable NTP bible - a series of NTP-related blog posts covering everything from Why Should I Run My Own NTP Servers to authentication, security and monitoring - definitely a MUST READ if you care about knowing what time it is.
Listening to (some) industry evangelists you would believe that there’s no future in being a networking engineer. After all, all workloads will move into the cloud, and all clients will connect through a universal 5G network… but even if that utopia eventually comes true, you can’t get away from the laws of physics (and the need networking infrastructure).
An anonymous (for reasons that will be obvious pretty soon) commenter left a gem on my Disaster Recovery Test Faking blog post that is way too valuable to be left hidden and unannotated.
Here’s what he did:
Once I was tasked to do a DR test before handing over the solution to the customer. To simulate the loss of a data center I suggested to physically shutdown all core switches in the active data center.
That’s the right first step: simulate the simplest scenario (total DC failure) in a way that is easy to recover (power on the switches). It worked…
A subscriber sent me this intriguing question:
Is it not theoretically possible for Ethernet frames to be 64k long if ASIC vendors simply bothered or decided to design/make chipsets that supported it? How did we end up in the 1.5k neighborhood? In whose best interest did this happen?
Remember that Ethernet started as a shared-cable 10 Mbps technology. Transmitting a 64k frame on that technology would take approximately 50 msec (or as long as getting from East Coast to West Coast). Also, Ethernet had no tight media access control like Token Ring, so it would be possible for a single host to transmit multiple frames without anyone else getting airtime, resulting in unacceptable delays.
TL&DR: You automate the whole process. What else do you expect?
During the Tech Field Day Extra @ Cisco Live Europe 2019 we were taken on a behind-the-stage tour that included a chat with people who built the Cisco Live network, and of course I had to ask how they automated the whole thing. They said “well, we have the guy that wrote the whole system onsite and he’ll be able to tell you more”. Turns out the guy was my good friend Andrew Yourtchenko who graciously showed the system they built and explained the behind-the-scenes details.
Dinesh Dutt added another awesome chapter to the EVPN saga last week explaining how (and why) you could run VXLAN encapsulation with EVPN control plane on Linux hosts (TL&DR: think twice before doing it).
In the last part of current Azure Networking series I covered external VNet connectivity, including VNet peering, Internet access, Virtual Network Gateways, VPN connections, and ExpressRoute. The story continues on February 6th 2020 with Azure automation.
You’ll need Standard ipSpace.net Subscription to access both webinars.
Gregor Hohpe published an excellent series on Martin Fowler’s web site focusing on various aspects of lock-in. If nothing else, you SHOULD read the shades of lock-in part, and combine it with my thoughts on lock-in in data center networking.
Grouping the features needed in a networking stack in a bunch of layered modules is a great idea. Unfortunately, you could place several essential features like error recovery, retransmission, and flow control in several different layers, from the data link layer dealing with individual network segments, to the transport layer dealing with reliable end-to-end transmissions.
Where should we put those modules? As always, the correct answer is it depends, in this particular case, on transmission reliability, latency, and bandwidth cost. You’ll find more details in the Retransmissions and Flow Control part of How Networks Really Work webinar.
How nice would it be to have a fabric health dashboard displaying a summary of numerous parameters you’re interested in (number of operational uplinks, number of BGP sessions…) for every switch in your fabric.
I’m positive you could hack something together using the customization capabilities of your favorite network management system… or you could write a simple data gathering solution like Stephen Harding did while attending the Building Network Automation Solutions online course.
A while ago I had an interesting discussion with someone running VMware NSX on top of VXLAN+EVPN fabric - a pretty common scenario considering:
- NSX’s insistence on having all VXLAN uplink from the same server in the same subnet;
- Data center switching vendors being on a lemming-like run praising EVPN+VXLAN;
- Non-FANG environments being somewhat reluctant to connect a server to a single switch.
His fabric was running well… apart from the weird times when someone started tons of new VMs.
A Docker networking rant coming from my good friend Marko Milivojević triggered a severe case of Deja-Moo, resulting in a flood of unpleasant memories caused by too-successful “disruptive” IT vendors.
Imagine you’re working for a startup creating a cool new product in the IT infrastructure space (if you have an oversized ego you would call yourself “disruptive thought leader” on your LinkedIn profile) but nobody is taking you seriously. How about some guerrilla warfare: advertising your product to people who hate the IT operations (today we’d call that Shadow IT).
Have you ever seen an Ansible playbook where 90% of the code prepares the environment, and then all the work is done in a few template and assemble modules? Here’s an alternative way of getting that done. Is it better? You tell me ;)
Anycast (advertising the same IP address from multiple servers/locations) has long been used to implement scale-out public DNS services (the whole root DNS system runs on massive anycast), but it’s not as common in enterprise networks.
Want to know even more? I covered numerous load balancing mechanisms including anycast in Data Centers Infrastructure for Networking Engineers webinar.
A while ago Johannes Weber tweeted about an interesting challenge:
We want to advertise our AS and PI space over a single ISP connection. How would a setup look like with 2 Cisco routers, using them for hardware redundancy? Is this possible with only 1 neighboring to the ISP?
Hmm, so you have one cable and two router ports that you want to connect to that cable. There’s something wrong with this picture ;)
Remember Nicky Davey describing how he got large DMVPN deployment back on track with configuration templating? In his own words…:
Configuration templating is still as big win a win for us as it was a year ago. We have since expanded the automation solution, and reading the old blog post makes me realise how far we have come. I began working with this particular customer in May 2017, so 2 years now. At that time the new WAN project was on the horizon and the approach to network configuration was entirely manual.
Here’s how far he got in the meantime:
We also had a great guest speaker on the Network Automation course: Damien Garros explained how he used central source-of-truth based on NetBox and Git to set up a network automation stack from the grounds up.
Recordings are already online; you’ll need Standard ipSpace.net Subscription to access the Azure Networking webinar, and Expert ipSpace.net Subscription to access Damien’s presentation. Azure Networking webinar is also part of our new Networking in Public Clouds online course.
This is a guest blog post by Philippe Jounin, Senior Network Architect at Orange Business Services.
You could use track objects in Cisco IOS to track route reachability or metric, the status of an interface, or IP SLA compliance for a long time. Initially you could use them to implement reliable static routing (or even shut down a BGP session) or trigger EEM scripts. With a bit more work (and a few more EEM scripts) you could use object tracking to create time-dependent static routes.
Cisco IOS 15 has introduced Enhanced Object Tracking that allows first-hop router protocols like VRRP or HSRP to use tracking state to modify their behavior.