Setting Source IP Address on Traffic Started by a Multihomed Host
In the Path Failure Detection on Multi-Homed Servers blog post, I mentioned running BGP on servers as one of the best ways to detect server-to-network failures. As always, things aren’t as simple as they look, as Cathal Mooney quickly pointed out:
One annoyance is what IP address gets used by default by the system for outbound traffic. It would be nice to have a generic OS-level way to say, “This IP on lo0 should be default for outbound IP traffic unless to the connected link subnet itself.”
That’s definitely a tough nut to crack, and Cathal described a few solutions he used in the past:
Obviously, some software allows you to specify the source IP to use, but again more complexity in config. And some doesn’t. I’ve solved it before with an iptables/nft SNAT rule for everything not on the connected subnet, but again, it’s messier than one would like.
You can also try a few other tricks, including:
Use multipath-aware software. iSCSI immediately comes to mind. For more details, read the Applications Using Multiple IP Addresses part of the Redundant Layer-3-Only Data Center Fabrics document.
Use Multipath TCP. Assuming you’re worried about clients running on the multihomed servers1, you could use Multipath TCP to use all available external IP addresses. Once the parallel TCP sessions are established, Multipath TCP survives the loss of any one IP address. QUIC has similar capabilities but requires changes in the applications using it because we borked the socket API (that’s why SCTP got nowhere2).
Use the same IP Address on all interfaces. Linux doesn’t care if you use the same IP address on all interfaces and will happily use it. Unfortunately, you cannot run BGP with two ToR switches in this setup, hoping it will just work3. You can run BGP over IPv6 LLA addresses, though (that trick is often called unnumbered BGP because that sounds better).
Specify the source IP address in the routing table. Yes, you can do that on Linux – the ip route add command accepts the next hop and the source IP address parameters. If you cannot use the same trick with routes derived from a routing protocol4, use a workaround: your servers could receive a viable next hop via BGP and then use a default route pointing to the BGP-derived next hop5.
Use iptables, the duct tape of Linux networking. Bonus points: you’ll be the only one understanding your network.
Keep calm and carry on.6 The “What source IP address will be used?” challenge applies only to client sessions (outgoing sessions established by the multihomed server). If these sessions tend to be short-lived (for example, HTTP requests in a multi-tier application), and if your application survives an occasional failure7, you’re trying to solve an imaginary problem.
Finally, if you want to figure out which source IP address an application uses, the blog post by Michael Kashin might be handy.
-
With server/daemon applications listening to incoming TCP requests on the loopback IP address. ↩︎
-
OK, I know it’s successfully used in niche applications, and some of those niches could be significant, but so is every other networking technology ever invented. ↩︎
-
You can configure secondary IP addresses on physical interfaces if your BGP daemon allows you to specify the source IP address to use in outgoing TCP sessions. Alternatively, you could make server BGP daemons passive and let the ToR switches connect to whatever address they like. The opportunities for enhanced job security are endless. ↩︎
-
FRR has a set src route map option, but I couldn’t find it in the FRR route-map documentation. Maybe I will create a lab one of these days to figure out how it works. ↩︎
-
I told you, improving your job security has never been easier ;) ↩︎
-
See also: Don’t Panic Towel. ↩︎
-
It better does; otherwise, you have a bigger problem on your hands. ↩︎
Especially if there is a stateful firewall upstream that gets confused by the different inbound and outbound packet paths!
My friend, you've seen way too many horrible things 😅
But yes, that's why one of our customers turned an expensive Cisco ASA firewall into a packet filter sitting in front of an IBM mainframe 🤷♂️
I dont’t know if things have changed, but at least in the past setting the source address with the static route did not work for IPv6 in Linux due to IPv6’s own source selection algorithm.
Using addrlabels for IPv6 destinations and addresses makes it eventually pretty straightforward, but I dont’t know how many are familiar with this concept.
It's certainly good to know the basics and finer details of Linux server networking. But unless I'm missing some major use case, I'd think if you're implementing BGP-to-the-host, the vast majority of people will not deploy their applications straight onto the bare metal server, but rather in containers or (micro)VM's. Besides the obvious security issues this poses, apart from basic 'detecting server-to-network failures', there are bound to be many other things to address like: ipam, anycasting, (reverse) proxy/load balancing, policies/filtering, orchestration, .... Sticking to the network part of the problem field, one would be wise to look at offerings like Cilium or Calico. (unless of course the main goal is the one mentioned multiple times in the footnotes)