Use BitTorrent to update software in your Data Center

Stretch (@packetlife) shared an interesting link in a comment to my P2P traffic is bad for the network post: Facebook and Twitter use BitTorrent to distribute software updates across hundreds (or thousands) of servers ... another proof that no technology is good or bad by itself (Greg Ferro might have a different opinion about FCoE).

Shortly after I’ve tweeted about @packetlife’s link, @sevanjaniyan replied with an even better link to a presentation by Larry Gadea (infrastructure engineer @ Twitter) in which Larry describes Murder, Twitter’s implementation of software distribution on top of BitTornado library.

If you have a data center running large number of servers that have to be updated simultaneously, you should definitely watch the whole presentation; here’s a short spoiler for everyone else:

  • They were able to reduce distribution time from 900 seconds to 12 seconds.
  • BitTorrent is severely restrained, both in number of TCP sessions as well as the bandwidth it can use (and there was a hint that they’ve managed to somewhat overload the network infrastructure during the tests).
  • The BT clients grab the file and then fork and continue seeding for 30 seconds. If it takes 12 seconds for a usual distribution, seeding for additional 30 seconds should be more than enough;
  • The made a lot of tweaks and optimizations. They reduced timeouts, disabled all “ISP resiliency” features (encryption and DHT) and (obviously) UPnP and decided to force the seeding from an in-memory image (to reduce disk access requirements).

Next comes the elegant part: they developed two wrappers, a Python wrapper around BitTornado, which gives you higher-level functions and a really high-level Capistrano wrapper, which gives you the functionality we really need: distribute directory tree X into directory Y on all servers.

And I’ve saved the best for last: they made Murder available under Apache 2 license.

8 comments:

  1. Dang, remember the times when reliable multicast was invented for that same purpose :)
  2. It would be awesome if we would see that Multicast would be used more when sharing large amounts of data across many computers. I don't know all the ins and outs of Multicast, but I see trouble with acknowledgments and resending of lost data.. :(
  3. You see, that's the difference between an academic solution and real-life engineering :-P

    If you want to deploy reliable multicast, you have to deploy multicast in your network (assuming your 1000 servers are not bridged together), find a software package that actually supports reliable multicast and then troubleshoot the whole thing until you find out all the bugs and omissions. Of course there could be a huge hidden community of reliable multicast users that I'm totally unaware of (I already got burnt on server-to-server FTP transfers and Tcl+IOS skills, so I should have learnt to shut up ... but I haven't ... yet).

    On the other hand, you can take software stress-tested by millions of users, tweak it a bit and have it up and running in a few days. Not to mention this solution is way more scalable and resilient than reliable multicast could ever be ... but, yes, it uses more bandwidth ... which might not be relevant if you're distributing 100MB package over 10Gb links 8-)
  4. Well the whole idea of reliable multicast was relying on NAKs (negative ACKs) as opposed to positive ACKs - this reduces the upstream feedback flow - and implementing intermediate buffering - this improves local stream recovery. The design was therefore relatively scalable, but never reached widespread adoption, though there is (limited) support in Cisco IOS.
  5. See this is why we're screwed with Internet ;) They always throw in bandwidth on any problem and then go on a coffee break saying how much progress we have made!

    I damn the Moore law every day, it just spoils people ;)
  6. I never knew https://tools.ietf.org/html/rfc5740 such thing existed already. Another day, another thing learned :-)
  7. On the subject of multicast software deployment, I thought you might find this interesting; we use it to deploy disk images to our Windows XPE-based point of sale devices.

    http://www.cs.utah.edu/flux/papers/frisbee-usenix03/
Add comment
Sidebar