P2P network scalability

Youtube is said to be pushing about 25 petabytes per month which is about 77 Gbps sustained data rate on an average. The bandwidth usage at the peaks would be even higher. Thanks to Limelight networks, Youtube doesn't really need to scale or provision for that kind of bandwidth and based on the some reports from 2006 it had cost them close to 4 million a month back then. Youtube and services like that have to invest a lot in their infrastructure before they can really launch their service and though using shared Content delivery networks is not ideal, its probably not a bad deal. In Youtube's case, it helped them survive until Google bought it out.

Newer Internet television service providers, however need not build their services around the traditional CDN model. Joost Network architecture presentation from Colm MacCarthaigh is an interesting example to discuss to prove my point. Joost was founded by the same guys who founded Kazaa and Skype . Kazaa was one of notorious P2P file sharing application (used the FastTrack protocol) which died after RIAA revolt. Skype, as it happens, also has its roots in P2P network [ Skype protocol , Skype scalability problems ] and has been doing pretty good over the years. So its no surprise that Joost chose P2P model again to distribute part of the content to its users. Joost has a cluster of servers which serve as "original seeders" or all content, and rely on the P2P network to distribute the popular content. The number of Joost servers, however, is not small because it still also has to address the "long tail" of requests which are not among the popular content.

Two of the most important network optimization ground rules, which I noticed from the talks, was that they decided against using firewalls or loadbalancers in its network. Thats good, because the firewalls and loadbalancers wouldn't have kept up with the bandwidth anyway. But even more impressive was that they designed the entire P2P application/network-algorithm to intelligently find and peer with nodes and supernodes closest to them. Joost tries to do this this in two different ways. The first one is using IP address (prefix aware) as proximity sensors (two IPs which start with similar set of numbers/octets will probably be in the same network). The second way to detect proximity is using Network AS Numbers which can work irrespective of what the IP addresses start with. [ Colm also mentioned about AS proximity detection below ]

A comment to blog @ ipdev.net by Colm himself
We have many gigs of transit, and are adding more. I'm not sure who claimed it's near HD quality, I like to think it's about NTSC, sometimes better, never quite PAL.We have some efforts in the code to save transit costs, there is very very basic prefix awareness, and we're adding AS-level awareness using live BGP data. I have looked at adding AS adjacency information, ie prefer AS-adjacent peers, but it's a lot of work and the US internet is relatively poorly mapped, so I don't think this will come soon.


Its possible that Joost might still require CDNs to serve the long-tail content, but the work they have done to build the P2P infrastructure would not only save them an a lot of mulah in the long run but would also allow them to easily scale to be larger than any of the current CDNs if they do get that big.

Interestingly companies like Microsoft are not sitting idle watching the world go by. Microsoft has been working on something called Avalanche and I think they already have a prototype client out which you can download and try it out yourself.

Microsoft Secure Content Downloader


Some MSCD clients may be connected to each other via peer connections, forming a â€Ëœcloudâۉ„¢ of clients. Pieces of the file you are downloading are sent through these peer connections between clients, as well as through connections with the file server. As a member of the cloud, your computer both serves as a client and server to other members of the cloud. Data destined for the cloud may be routed through your computer and sent to other cloud members. The other cloud members connected to you will be able to access only pieces of the file you are downloading via MSCD âۉ€Å“ they have no access to any other data on your computer.

You are only connected to other clients while you are downloading a file via MSCD. When the file has finished downloading âۉ€Å“ or when you pause or cancel the download, or exit the application âۉ€Å“ you disconnect from the cloud. Once you disconnect from the cloud, you will no longer have any connections to any other members in the cloud and no data will be routed through your computer.The Microsoft Secure Content Downloader (MSCD) is a peer-assisted download manager capable of securely downloading specific files. MSCD is intended for consumers who are downloading from a home PC, or business users whose computers are not behind a corporate firewall. If you use MSCD from behind a corporate firewall, you may be unable to download content, and may adversely affect other clients' ability to download content.

Of course there are also other rumors that apple is trying this out... but you know how these things go.


Anyway, the point is that in spite of occasional gliches P2P is probably the way to go if you want to cut long term costs of CDN. Personally, I believe that Skype had no other way out. I mean can you think off all the phone calls in the world going through the same first phone exchange in New Haven, Connecticut where it all started ? P2P models are still evolving and its hard to imagine there will be a one-solution-fits-all. But if you know one, please let me know.

Comments

Popular posts from this blog

Chrome Frame - How to add command line parameters

Creating your first chrome app on a Chromebook

Brewers CAP Theorem on distributed systems