Scalable Tools: Murder: a bittorent based, file transfer tool for faster deployments

Deploying to one server can be done with a single script doing a bunch of ssh/scp commands. If you have a few more servers, you can run it in a loop sequentially or fork the processes to do them in parallel. At some point though, it will get unmanageable especially if you have the challenge of updating multiple datacenters at the same time. So how does a company like twitter do their releases ?Twitter.com

Murder is an interesting P2P/Bittorent based tool which twitter uses to distribute files during its own software updates. Here are some more details.

Murder is a method of using Bittorrent to distribute files to a large amount 
of servers within a production environment. This allows for scalable and fast
deploys in environments of hundreds to tens of thousands of servers where
centralized distribution systems wouldn't otherwise function. A "Murder" is
normally used to refer to a flock of crows, which in this case applies to a
bunch of servers doing something.

In order to do a Murder transfer, there are several components required to be
set up beforehand -- many the result of BitTorrent nature of the system. Murder
is based on BitTornado.

- A torrent tracker. This tracker, started by running the 'murder_tracker.py'
script, runs a self-contained server on one machine. Although technically this
is still a centralized system (everyone relying on this tracker), the
communication between this server and the rest is minimal and normally
acceptable. To keep things simple tracker-less distribution (DHT) is currently
not supported. The tracker is actually just a mini-httpd that hosts a
/announce path which the Bittorrent clients update their state onto.

- A seeder. This is the server which has the files that you'd like to deploy
onto all other servers. For Twitter, this is the server that did the git diff.
The files are placed into a directory that a torrent gets created from. Murder
will tgz up the directory and create a .torrent file (a very small file
containing basic hash information about the tgz file). This .torrent file lets
the peers know what they're downloading. The tracker keeps track of which
.torrent files are currently being distributed. Once a Murder transfer is
started, the seeder will be the first server many machines go to to get
pieces. These pieces will then be distributed in a tree-fashion to the rest of
the network, but without necessarily getting the parts from the seeder.

- Peers. This is the group of servers (hundreds to tens of thousands) which
will be receiving the files and distributing the pieces amongst themselves.
Once a peer is done downloading the entire tgz file, it will continue seeding
for a while to prevent a hotspot effect on the seeder.

MURDER TRANSFER PROCESS
-----------------------

1. Configure the list of servers and general settings in config.rb. (one time)
2. Distribute the Murder files to all your servers: (one time)
cap murder:distribute_files
3. Start the tracker: (one time)
cap murder:start_tracker
4. Create a torrent file from a remote directory of files (on seeder):
cap murder:create_torrent tag="Deploy20100101" files_path="~/files"
5. Start seeding the files:
cap murder:start_seeding tag="Deploy20100101"
6. Distribute the files to all servers:
cap murder:peer tag="Deploy20100101" destination_path="/tmp/out"

Once completed, all files will be in /tmp/out/Deploy20091015/ on all servers.


More info here

Comments

Popular posts from this blog

Chrome Frame - How to add command line parameters

Creating your first chrome app on a Chromebook

Brewers CAP Theorem on distributed systems