Use BitTorrent to update software in your Data Center
Stretch (@packetlife) shared an interesting link in a comment to my P2P traffic is bad for the network post: Facebook and Twitter use BitTorrent to distribute software updates across hundreds (or thousands) of servers ... another proof that no technology is good or bad by itself (Greg Ferro might have a different opinion about FCoE).
Shortly after I’ve tweeted about @packetlife’s link, @sevanjaniyan replied with an even better link to a presentation by Larry Gadea (infrastructure engineer @ Twitter) in which Larry describes Murder, Twitter’s implementation of software distribution on top of BitTornado library.
If you have a data center running large number of servers that have to be updated simultaneously, you should definitely watch the whole presentation; here’s a short spoiler for everyone else:
- They were able to reduce distribution time from 900 seconds to 12 seconds.
- BitTorrent is severely restrained, both in number of TCP sessions as well as the bandwidth it can use (and there was a hint that they’ve managed to somewhat overload the network infrastructure during the tests).
- The BT clients grab the file and then fork and continue seeding for 30 seconds. If it takes 12 seconds for a usual distribution, seeding for additional 30 seconds should be more than enough;
- The made a lot of tweaks and optimizations. They reduced timeouts, disabled all “ISP resiliency” features (encryption and DHT) and (obviously) UPnP and decided to force the seeding from an in-memory image (to reduce disk access requirements).
Next comes the elegant part: they developed two wrappers, a Python wrapper around BitTornado, which gives you higher-level functions and a really high-level Capistrano wrapper, which gives you the functionality we really need: distribute directory tree X into directory Y on all servers.
And I’ve saved the best for last: they made Murder available under Apache 2 license.
If you want to deploy reliable multicast, you have to deploy multicast in your network (assuming your 1000 servers are not bridged together), find a software package that actually supports reliable multicast and then troubleshoot the whole thing until you find out all the bugs and omissions. Of course there could be a huge hidden community of reliable multicast users that I'm totally unaware of (I already got burnt on server-to-server FTP transfers and Tcl+IOS skills, so I should have learnt to shut up ... but I haven't ... yet).
On the other hand, you can take software stress-tested by millions of users, tweak it a bit and have it up and running in a few days. Not to mention this solution is way more scalable and resilient than reliable multicast could ever be ... but, yes, it uses more bandwidth ... which might not be relevant if you're distributing 100MB package over 10Gb links 8-)
I damn the Moore law every day, it just spoils people ;)
http://www.cs.utah.edu/flux/papers/frisbee-usenix03/