Stretch (@packetlife) shared an interesting link in a comment to my P2P traffic is bad for the network post: Facebook and Twitter use BitTorrent to distribute software updates across hundreds (or thousands) of servers ... another proof that no technology is good or bad by itself (Greg Ferro might have a different opinion about FCoE).
Shortly after I’ve tweeted about @packetlife’s link, @sevanjaniyan replied with an even better link to a presentation by Larry Gadea (infrastructure engineer @ Twitter) in which Larry describes Murder, Twitter’s implementation of software distribution on top of BitTornado library.
If you have a data center running large number of servers that have to be updated simultaneously, you should definitely watch the whole presentation; here’s a short spoiler for everyone else:
- They were able to reduce distribution time from 900 seconds to 12 seconds.
- BitTorrent is severely restrained, both in number of TCP sessions as well as the bandwidth it can use (and there was a hint that they’ve managed to somewhat overload the network infrastructure during the tests).
- The BT clients grab the file and then fork and continue seeding for 30 seconds. If it takes 12 seconds for a usual distribution, seeding for additional 30 seconds should be more than enough;
- The made a lot of tweaks and optimizations. They reduced timeouts, disabled all “ISP resiliency” features (encryption and DHT) and (obviously) UPnP and decided to force the seeding from an in-memory image (to reduce disk access requirements).
Next comes the elegant part: they developed two wrappers, a Python wrapper around BitTornado, which gives you higher-level functions and a really high-level Capistrano wrapper, which gives you the functionality we really need: distribute directory tree X into directory Y on all servers.
And I’ve saved the best for last: they made Murder available under Apache 2 license.