Posts

Showing posts from 2007

Scaling Early: Feedjit

Image
Mark Maunder from Feedjit make an interesting presentation about scaling early . He focuses on some of the key operational issues related to the web server and server caching which I found very interesting. | View | Upload your own

Mysql on HDFS

A short thought provoking post by Mark Callaghan about running Mysql over HDFS . Its probably not ideal, but its an interesting thought regardless.

Scaling technorati - 100 million blogs indexed everyday

Image
Indexing 100 million blogs with over 10 billion objects, and with a user base which is doubling every six months, technorati has an edge over most blog search engines. But they are much more than search, and any technorati user can explain you that. I recommend you read John Newton's interview with David Sifry which I found fascinating. Here are the highlights from the interview if you don't have time to read the whole thing Current status of technorati 1 terabyte a day added to its content storage 100 million blogs 10 billion objects 0.5 billion photos and videos Data doubling every six months Users doubling every six months The first version was supposed to be for tracking temporal information on low budget. That version put everything in relational database which was fine since the index sizes were smaller then physical memory It worked fine till about 20 million blogs The next generation took advantage of parallelism. Data was broken up into shard

Scalability stories for Oct 22, 2007

Why most large-scale sites which scale are not written in java ?   -  ( What nine of the world's largest websites are running on)  -  A couple of very interesting blogs to read. Slashdot's setup Part 1 - Just in time for the 10 year anniversary. Flexiscale - Looks like an amazon competitor in the hosting business. Pownce - Lessons learned   - Lessons learned while developing Pownce, a social messaging web application Amazon's Dynamo - Dynamo is internal technology developed at Amazon to address the need for an incrementally scalable, highly-available key-value storage system. The technology is designed to give its users the ability to trade-off cost, consistency, durability and performance, while maintaining high-availability. [ PDF ] . This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience.  To achieve this level of avail

Crawling sucks.

I wrote my first crawler in a few lines of perl code to spider a website recursively about 10 years ago. Two years ago I wrote another crawler in a few thousand lines using java+php and mysql. But this time I wasn't really interested in competing with google, and instead crawled feeds (rss/atom). Google hadn't released its blog search engine at that time. Using the little java knowledge I had and 3rd part packages like Rome and some HML parsers I hacked up my first crawler in a matter of days. The original website allowed users to train the bayesian based engine to teach it what kind of news items you like to read and automatically track millions of feeds to find the best content for you. After a few weeks of running that site, I started having renewed appreciation for the search engineers who breath this stuff day in and day out. That site eventually went down... mostly due to design flaws I made early which I'm listing here for those who love learning. Seeding proble

EC2 for everyone. And now includes 64bit with 15GB Ram too.

Image
  Finally it happened. EC2 is available for everybody . And more than that they now provide servers with 7.5GB and 15GB of RAM per instance. Sweet.   For a lot of companies EC2 was not viable due to high memory requirements of some of the applications. Splitting up such tasks to use less memory on multiple servers was possible, but not really cost and time efficient. The release of new types of instances removes that road block and would probably invoke significant curiosity from memory crunching application developers. $0.10 - Small Instance (Default) 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform $0.40 - Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform $0.80 - Extra Large Instance 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platfor

Web Scalability dashboard

[ Blogofy: bringing feeds together ] I took a week's break from blogging to work on one of my long overdue personal projects. Even though I use Google Reader as my feed aggregator I noticed a lot of folks still prefer a visual UI to track news and feeds. The result of my experimentation of designing such a Visual UI to track feeds lead me to create Blogofy If you have an interesting blog on Web Scalability, Availability or Performance which you want included here please let me know. The list of blogs on the page is in flux at the moment and I might move the feeds around a little depending on user feedback and blog activity.

Scalable products: KFS released

Image
Kosmix , a search startup has released source to C++ implementation of something which looks like a clustered file system. This looks very similar to Hadoop/HDFS , but the C++ factor will be a big performance boost. From Skrenta blog Incremental scalability - New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes. Availability - Replication is used to provide availability due to chunk server failures. Re-balancing - Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is done to help with balancing disk space utilization amongst nodes. Data integrity - To handle disk corruptions to data blocks, data blocks are checksummed. Checksum verification is done on each read; whenever there is a checksum mismatch, re-replication is used to recover the corrupted chunk. Client side fail-over - During reads, if the client library determines that the chunkserver it is communicating with is unreacha

What is scalability ?

Image
When asked what they mean by scalability, a lot of people talk about improving performance, about implementing HA , or even talk about a particular technology or protocol. Unfortunately, scalability is none of that. Don't get me wrong. You still need to know all about speed, performance, HA technology, application platform, network, etc. But that is not the definition of scalability. Scalability, simply, is about doing what you do in a bigger way. Scaling a web application is all about allowing more people to use your application. If you can't figure out how to improve performance while scaling out, its okay. And as long as you can scale to handle larger number of users its ok to have multiple single points of failures as well. There are two key primary ways of scaling web applications which is in practice today. " Vertical Scalability " - Adding resource within the same logical unit to increase capacity. An example of this would be to add CPUs to an existing serve

Scaling Smugmug from startup to profitability

Image
Smugmug.com , a 5 year old company with just 23 employees has 315000 paying customers and 195 million photographs. CEO & "Chief Geek" Don MacAskill has a nice set of slides where he talks about its 5 year journey during which it went from small startup to a profitable business. The talk was given during Amazon's "Startup project" so it talks mostly about how it uses AWS (Amazon Web services). Other than wonderful fun loving employees who are also "super heroes" in his eyes, he talks about the how they are doubling the storage requirements on an yearly basis. They already have about 300 TB in use, and as of today all of that is on Amazon's S3. Don estimates that based on his estimates, for the storage they are using, they are saving about 500K per year which is pretty big for a small operation like theirs. The Smugmug architecture has evolved over time. Internally it can serve images in 3 different ways. It could either "proxy" th

Custom search engine to search your OPML and Delicious bookmarks

Image
Zoppr is a Custom Search engine which allows you to create custom Google search engine on the fly, by appending your bookmark page, wikipage, or any other kind of page with lots of interesting bookmarks/links on it. Once setup, google will search only across your bookmarks/links. For example this URL will help you search across an OPML file published somewhere on the internet http://www.zoppr.com/cse/http://share.opml.org/

Scalable web architectures

If you haven't noticed already there is a second blog which I maintain which is currently more busy than this particular blog. " Scalable web architectures " is a collection of posts about how web architectures which scale and technologies which make it happen. Here are some of the posts on that blog Scaling Powerset using Amazon’s EC2 and S3 Powerset could have gone the way most dot-com companies have gone, but instead they decided to try out Amazon’s EC2 (Elastic Cloud Computing) and S3(Simple Storage Service) to augment their computational needs. P2P network scalability Youtube is said to be pushing about 25 petabytes per month which is about 77 Gbps sustained data rate on an average. The bandwidth usage at the peaks would be even higher. Thanks to Limelight networks , Youtube doesn’t really need to scale or provision for that kind of bandwidth and based on the some reports from 2006 it had cost them close to 4 million a month back then. Youtube

Scalability Stories (15th Sept) Mysql Proxy, Cluster Fire System, Facebook apps and Twitter

There have been a lot of interesting stories from last week for me to share. If you have interesting links you want to add to this post please forward them to me or post a comment to this post. Sun is planning to acquire majority stake of " Cluster File Systems, Inc ". [ Talk on Lustre File System ] Sun intends to add support for Solaris Operating System (Solaris OS) on Lustre and plans to continue enhancing Lustre on Linux and Solaris OS across multi vendor hardware platforms. As previously announced in July 2007, Sun also plans to deliver Lustre servers on top of Sun's industry-leading open source Solaris ZFS solution, which is one of the fastest growing storage virtualization technology in the marketplace. Making Facebook Apps scale on cheap : An interesting writeup By Surj Patel about Scalability issues Facebook itself and the 3rd Party apps on it have. Also discusses EC2 and S3 as an alternative solution to scale in a cost effective way. Welcome to Amazon

Scaling Powerset using Amazon's EC2 and S3

Image
The first thing most doc-com companies do before going public is setup an infrastructure to provide the service. And though it might sound straight forward to most of you, it can be a very expensive affair. To come up with the right kind of infrastructure for any new service a few key architectural decisions have to be made. Design the infrastructure and architecture to handle the traffic peaks (not average) Design with long term scaling in mind Design power and cooling infrastructure to support the servers Hire Hardware+Systems+network support staff ( more if 24/7 operations are required) Add buffers to support failures and short term grow requirements Take into account lead times of ordering and procuring hardware (which can be weeks if not months).. And a few others.. which I won't bore you with here. The point is that the initial capital investment can be in millions even before the first customer starts using the service. And once the capital investment is made,

P2P network scalability

Image
Youtube is said to be pushing about 25 petabytes per month which is about 77 Gbps sustained data rate on an average. The bandwidth usage at the peaks would be even higher. Thanks to Limelight networks , Youtube doesn't really need to scale or provision for that kind of bandwidth and based on the some reports from 2006 it had cost them close to 4 million a month back then. Youtube and services like that have to invest a lot in their infrastructure before they can really launch their service and though using shared Content delivery networks is not ideal, its probably not a bad deal. In Youtube's case, it helped them survive until Google bought it out. Newer Internet television service providers, however need not build their services around the traditional CDN model. Joost Network architecture presentation from Colm MacCarthaigh is an interesting example to discuss to prove my point. Joost was founded by the same guys who founded Kazaa and Skype . Kazaa was one of notorious

Sharding: Different from Partitioning and Federation ?

Ive been hearing this word "sharding" more and more often, and its spreading like fire. Theo Schlossnagle, the author of " Scalable internet architecutres " argues that federation is form of partitioning, and that sharding is nothing but a form of partitioning and federation. Infact, according to him, Sharding has already been in use use for a long time . I'm not a dba, and I don't pretend to be one in my free time either, so to understand the differences I did some research and found some interesting posts. The first time I heard about "Sharding" was on Been Admininig's blog about Unorthodox approach to database design ( Part I and Part II ). Here is the exact reference... Splitting up the user data so that User A exists on one serverwhile User B exists on another server, each server now holds a s hard ofthe data in this federated model . A couple of months ago Highscalability.com picked it up and made it sound (probably unintentionally)

Adventures of scaling eins.de

Image
Patrick Lenz , founder and lead developer of freshmeat.net was also responsible for the relaunch of another website eins.de which recently moved from php to ruby. Eins.de site serves about 1.2 million dynamic pages a day. He wrote a series of articles describing how they redesigned the site to scale for growth. I found these articles very informative with a extreemly mature discussion of the colorful world of scalability. The adventures of scaling, Stage 1 Questions and answers for Stage 1 The adventures of scaling, Stage 2 The adventures of scaling, Stage 3 The adventures of scaling, Stage 4 Here is the final summary of what they ended up after 4 months of optimizations Systems optimization Use Linux 2.6 instead of 2.4 Use self-compiled Ruby 1.8.4 instead of all else Use MySQL-supplied binaries Use lighttpd 1.4.11 instead of all else Use memcache-client instead of Ruby-MemCache Use a smaller amount of dispatchers Watch your dispatchers Code optimizatio

Scalability stories from Sept 6th 2007

A 55 minute talk by ' Stewart Smith' from MySQL AB, about Mysql Clusters. He talks about NDB storage engine and synchronous replication between storage nodes. Also talks about new features in 5.1 including cluster to cluster replication, disk based data and a bunch of other things. And another Mysql talk on Google about Performance Tuning Best practices for Mysql. An interesting talk with Leah Culver about how Pownce was created . They use LAMP (Python) stack with Perlbal , Memcached , Django , AIR with Amazon S3 as the backend storage. Discussion about  High-Availability Mongrel Packs using Seesaw A blog about " Future of Data Center Computing " mentions Terracotta Sessions. If you read my previous post about "Session, state and scalabililty" and understood the problem, do look at this as a solution as well EC2 and S3 are being used more than before. Unfortunately because storage on EC2 doesn't persist across reboots creative ways of k

Session, state and scalability

In my other life I work with a medium scale web application which has had many different kinds of growing problems over time. One of the most painful one is the issue about "statelessness". If I could only give one recommendation to anyone building a brand new web application, I'd say " go stateless ". But going stateless is not the same as going session-less. One could implement a perfectly stateless web architecture which still uses sessions to authenticate, authorize and track user activity. And to complicate matters further, when I say stateless, I really mean that the server should be stateless, not the client. Basic authentication Most interactive web applications today which allow you to manipulate data in some form require authentication and authorization mechanisms to control access. In the good old days "Basic Authentication " was the most commonly used authentication mechanism. Once authenticated, the browser would send the credentials to

Scalability Stories for Aug 30

Image
I found a very interesting story on how memcached was created . Its an old story titled " Distributed caching with memcached ". I also found an interesting FAQ on memcached which some of you might like. Inside Myspace is another old story which follows Myspace's growth over time. Its a very long and interesting read which shouldn't be ignored. Measuring Scalability tries to put numbers to the problem of scalability. If you have to justify the cost of scalability to anyone in your organization, you should atleast skim through this page I found a wonderful story on the humble architecture of Mailinator and how it grew over time on just one webserver. It receives approx 5 million emails a day and runs the whole operation pretty much in memory with no logs or database to leave traces behind. And here is another page from the creator of Mailinator abouts its stats from Feb . Finally another very interesting presentation/slide on the topic of " Scalable

Thoughts on scalability

Here is an interesting contribution on the topic from Preston Elder   I've worked in multiple extremely super-scaled applications (including ones sustaining 70,000 connections at any one time, 10,000 new connections each minute, and 15,000 concurrent throttled file transfers at any one time - all in one application instance on one machine). The biggest problem I have seen is people don't know how to properly define their thread's purpose and requirements, and don't know how to decouple tasks that have in-built latency or avoid thread blocking (and locking). For example, often in a high-performance network app, you will have some kind of multiplexor (or more than one) for your connections, so you don't have a thread per connection. But people often make the mistake of doing too much in the multiplexor's thread. The multiplexor should ideally only exist to be able to pull data off the socket, chop it up into packets that make sense, and hand it off to some kind of

Loadbalancer for horizontal web scaling: What questions to ask before implementing one.

Image
A single server, today, can handle an amazing amount of traffic. But sooner or later most organizations figure out that they need more and talk about choosing between horizontal and vertical scaling. If you work for such an organization and also happen to manage networking devices, you might find a couple of loadbalancers on your desk one day along with a yellow sticky note with a deadline on it. Loadbalancers, by definition, are supposed to solve performance bottlenecks by distributing or balancing load between different components its managing. Though you would normally find loadbalancers in front of a webserver, a lot of different individuals have found other interesting ways of using it. For example I know some organizations have so much network traffic that they can't use just one sniffer or firewall to do their job. They ended up using some high end loadbalancers to intelligently loadbalance network traffic through multiple sniffers/firewalls attached to them. These are rar

Feature or a bug ?

Image
Dratz asks : Feature or a bug ?

TypePad architecture: Problems and solutions

Image
TypePad was and probably is one of the first and largest paid blogging service in the world. In a presentation at OSCON 2007 , Lisa Phillips and Garth Webb spoke about TypePad's problems in 2005. Since this is a common problem with any successful company I found it interesting enough to research a little more. TypePad was, like any other service, initially designed in the traditional way with Linux, Postgres, Apache, mod_perl, perl as the front end and NFS storage for images on a filer. At that time they were pushing close to 250mbps (4TB per day) through multiple pipes and with growing user base, activity and data they were growing at the rate of 10 to 20% per month. Just before the planned move to newer better data center, sometime in Oct 2005, TypePad started experiencing all kinds of problems due to its unexpected growth. The unprecedented stress on the system caused multiple failures over the next two months which ranged from hardware, software, storage to networking issues.

Web storage for backups

Image
I'm contemplating using S3 for backups. Paul Stamantiou has a script ready to go. The thing which convinced me was this chart Paul showed. For 10GB of space he paid under 3 dollars per month. Thats really cheap... GMail, Microsoft and yahoo all provide extra storage as well. However none of them have stable company supported APIs to allow users to upload data in this form.

How Skype network handles scalability..

There was a major skype outage last week and though there is an " official explaination " and other discussions about it floating around, I found this comment from one of the GigaOm readers more interesting to think about. Now this particular description may not accurately describe the problem (which might be speculation as well) but it does describe , in a few words, how skype's p2p network scales out. You should also take a look at the detailed discussion of the skype protocol here . Number of Skype Authentication servers: Count == 50; // Clustered Number of potential Skype clients: Count = 220,000,000 // Mostly decentralized Number of SuperNode clients to maintain network connectivity: Count = N / 300 at any one time. • If there are 3.0 million users online then the ratio is 3,000,000 / 300 = 10,000 == Supernodes available • Supernodes are bootstraps into the network for normal first run clients ("and handle routing of children calls"). •