Archive for August, 2007

Mysql Cluster

Saturday, August 11th, 2007

Link

“Introduction to MySQL Cluster The NDB storage engine (MySQL Cluster) is a high-availability storage engine for MySQL. It provides synchronous replication between storage nodes and many mysql servers having a consistent view of the database. In 4.1 and 5.0 it’s a main memory database, but in 5.1 non-indexed attributes can be stored on disk. NDB also provides a lot of determinism in system resource usage. I’ll talk a bit about that.”

Technorati Profile

Popularity: 15%

How To Design A Good API and Why it Matters

Friday, August 10th, 2007

A very interesting Google talk about designing a good API. This may not seem like a scalability issue, but if you really want to host a horizontally scalable system you need to have a good scalable API design to go with it.

Every day around the world, software developers spend much of their time working with a … all variety of Application Programming Interfaces (APIs). Some are integral to the core platform, some provide access to widely distributed frameworks, and some are written in-house for use by a few developers. Nearly all programmers occasionally function as API designers, whether they know it or not. A well-designed API can be a great asset to the organization that wrote it and to all who use it. Good APIs increase the pleasure and productivity of the developers who use them, the quality of the software they produce, and ultimately, the corporate bottom line. Conversely, poorly written APIs are a constant thorn in the developer’s side, and have been known to harm the bottom line to the point of bankruptcy. Given the importance of good API design, surprisingly little has been written on the subject. In this talk, I’ll attempt to help you recognize good and bad APIs and I’ll offer specific suggestions for writing good ones.

Popularity: 10%

New Talks and Slides links from Aug 5 2007

Sunday, August 5th, 2007

If you haven’t seen these links before.. you should check this page first “Talks and slides from web architects“. But if you have already seen that page… here are the updates from last week.

  PDF Case for Shared Nothing
  PDF The Chubby Lock Service for Loosely-Coupled Distributed Systems
    Building Highly Scalable Web Applications
1/1/2006 Slides The Ebay architecture
1/1/2007 Slides PHP & Performance
4/20/2007 Video Brad Fitzpatrick - Behind the Scenes at LiveJournal: Scaling Storytime
5/4/2006   Scalable computing with Hadoop
6/3/2007   Hadoop Map/Reduce
8/3/2007   Introduction to hadoop
6/1/2007 Slides Hadoop distributed file system
    Yahoo experience with hadoop
7/25/2007   Meed Hadoop
8/3/2007 webpage The Hadoop Distributed File System: Architecture and Design
7/25/2007 Blog Yahoo’s Hadoop Support
7/18/2007 Blog Running Hadoop MapReduce on Amazon EC2 and Amazon S3
8/3/2007   Interpreting the Data: Parallel Analysis with Sawzall
10/18/2005 Video BigTable: A Distributed Structured Storage System
1/1/2006 PDF Bigtable: A Distributed Storage System for Structured Data
1/1/2004 PDF MapReduce: Simplified Data Processing on Large Clusters
1/1/2003 PDF Google File System
8/3/2007 PDF ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
8/3/2007 PDF SEDA: An Architecture for well conditioned scalable internet services
8/3/2007 PDF A scalable architecuture for Global web service hosting service
6/23/2007 Blog Getting Started with Drupal
6/23/2007 Blog 4 Problems with Drupal

Popularity: 11%

Hadoop and HBase

Saturday, August 4th, 2007

This may not be a surprise for a lot of people but it was for me. Even though I have been using lucene and nutch for some time, I didn’t really know enough about Hadoop and HBase until recently.

Hadoop

  • Scalable: Hadoop can reliably store and process petabytes.
  • Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
  • Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
  • Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.


Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

HBase

Google’s Bigtable, a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment.

Just as Bigtable leverages the distributed data storage provided by the Google File System, Hbase will provide Bigtable-like capabilities on top of Hadoop.

Data is organized into tables, rows and columns, but a query language like SQL is not supported. Instead, an Iterator-like interface is available for scanning through a row range (and of course there is an ability to retrieve a column value for a specific key).

Any particular column may have multiple values for the same row key. A secondary key can be provided to select a particular value or an Iterator can be set up to scan through the key-value pairs for that column given a specific row key.

Popularity: 19%

Talks and slides from various web architects

Friday, August 3rd, 2007

For latest set of links go here.

This is a collection of various slides, pdfs and videos about designing scalable websites I collected time. If you have something interesting which might go in here, please let me know.

Date Type Title
6/23/2007 Blog Getting Started with Drupal
6/23/2007 Blog 4 Problems with Drupal
6/23/2007 Video Seattle Conference on Scalability: MapReduce Used on Large Data Sets
6/23/2007 Video Seattle Conference on Scalability: Scaling Google for Every User
6/23/2007 Video Seattle Conference on Scalability: VeriSign’s Global DNS Infrastucture
6/23/2007 Video Seattle Conference on Scalability: YouTube Scalability
6/23/2007 Video Seattle Conference on Scalability: Abstractions for Handling Large Datasets
6/23/2007 Video Seattle Conference on Scalability: Building a Scalable Resource Management
6/23/2007 Video Seattle Conference on Scalability: SCTPs Reliability and Fault Tolerance
6/23/2007 Video Seattle Conference on Scalability: Lessons In Building Scalable Systems
6/23/2007 Video Seattle Conference on Scalability: Scalable Test Selection Using Source Code
6/23/2007 Video Seattle Conference on Scalability: Lustre File System
6/9/2007 Slides Technology at Digg.com
6/9/2007 Blog Extreme Makeover: Database or MySQL@YouTube
4/26/2007 Blog Mysql at Google
4/1/2007 Slides Scaling Twitter
4/1/2007 Slides How we build Vox
4/1/2007 Slides High Performance websites
4/1/2007 Slides Beyond the file system design
4/1/2007 Slides Scalable web architectures
3/1/2007 Slides Scalability set Amazon’s servers on fire not yours
3/1/2007 Slides Hardware layouts for LAMP installations
3/1/2007 Video Mysql scaling and high availability architectures
3/1/2007 Audio Lessons from Building world’s largest social music platform
3/1/2007 PDF Lessons from Building world’s largest social music platform
3/1/2007 Slides Lessons from Building world’s largest social music platform
11/1/2006 PDF Livejournal’s backend: history of scaling
11/1/2006 Slides Livejournal’s backend: history of scaling
11/1/2006 Slides Scalable Web Architectures (w/ Ruby and Amazon S3)
10/26/2006 Blog Yahoo! bookmarks uses symfony
7/26/2006 Slides Getting Rich with PHP 5
7/26/2006 Audio Getting Rich with PHP 5
3/7/2006 Blog Scaling Fast and Cheap - How We Built Flickr
3/1/2005 News Open source helps Flickr share photos
  Slides Flickr and PHP
  Slides Wikipedia: Cheap and explosive scaling with LAMP
  Blog YouTube Scalability Talk
    High Order Bit: Architecture for Humanity
  PDF Mysql and Web2.0 companies
8/3/2007   Building Highly Scalable Web Applications
8/3/2007   Introduction to hadoop
8/3/2007 webpage The Hadoop Distributed File System: Architecture and Design
8/3/2007   Interpreting the Data: Parallel Analysis with Sawzall
8/3/2007 PDF ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
8/3/2007 PDF SEDA: An Architecture for well conditioned scalable internet services
8/3/2007 PDF A scalable architecuture for Global web service hosting service
7/25/2007   Meed Hadoop
7/25/2007 Blog Yahoo’s Hadoop Support
7/18/2007 Blog Running Hadoop MapReduce on Amazon EC2 and Amazon S3
6/22/2007   LH*RSP2P : A Scalable Distributed Data Structure for P2P Environment
6/12/2007   Scaling the Internet routing table with Locator/ID Separation Protocol (LISP)
6/3/2007   Hadoop Map/Reduce
6/1/2007 Slides Hadoop distributed file system
4/20/2007 Video Brad Fitzpatrick - Behind the Scenes at LiveJournal: Scaling Storytime
2/1/2007 Slides Inside LiveJournal’s Backend (April 2004)
2/1/2007 Slides How to scale
1/23/2007   Testing Oracle 10g RAC Scalability
1/1/2007 Slides PHP & Performance
12/22/2006   SQL Performance Optimization
10/13/2006   Building_a_Scalable_Software_Security_Practice
5/31/2006   Building Large Systems at Google
5/4/2006   Scalable computing with Hadoop
1/1/2006 Slides The Ebay architecture
1/1/2006 PDF Bigtable: A Distributed Storage System for Structured Data
1/1/2006 PDF Fault-Tolerant and scalable TCP splice and web server architecture
10/18/2005 Video BigTable: A Distributed Structured Storage System
1/1/2004 PDF MapReduce: Simplified Data Processing on Large Clusters
8/3/2003 PDF Google Cluster architecture
1/1/2003 PDF Google File System
11/1/2002 Doc Implementing a Scalable Architecture
10/30/2001 News How linux saved Millions for Amazon
    Yahoo experience with hadoop
  Slides Scalable web application using Mysql and Java
  Slides Friendster: scalaing for 1 Billion Queries per day
  Blog Lightweight web servers
  PDF Mysql Scale out by application partitioning
  PDF Replication under scalable hashing: A family of algorithms for Scalable decentralized data distribution
  Product Clustered storage revolution
  Blog Early Amazon Series
  Web Wikimedia Server info
  Slides Wikimedia Architecture
  Slides MySpace presentation
  PDF A scalable and fault-tolerant architecture for distributed web resource discovery
8/4/2007 PDF The Chubby Lock Service for Loosely-Coupled Distributed Systems
8/5/2007 Slides Real world Mysql tuning
8/5/2007 Slides Real world Mysql performance tuning
8/5/2007 Slides Learning MogileFS: Buliding scalable storage system
8/5/2007 Slides Reverse Proxy and Webserver
8/5/2007 PDF Case for Shared Nothing
7/1/2007 Slides A scalable stateless proxy for DBI
1/1/2006 Slides Real world scalability web builder 2006
8/5/2005 Slides Real world web scalability

Popularity: 27%

Youtube scalability

Thursday, August 2nd, 2007

Popularity: 11%

Scalable Internet Architectures

Thursday, August 2nd, 2007

By Theo Schlossnagle

 

As a developer, you are aware of the increasing concern amongst developers and site architects that websites be able to handle the vast number of visitors that flood the Internet Scalable Internet Architectures (Developer's Library)on a daily basis. Scalable Internet Architecture addresses these concerns by teaching you both good and bad design methodologies for building new sites and how to scale existing websites to robust, high-availability websites. Primarily example-based, the book discusses major topics in web architectural design, presenting existing solutions and how they work. Technology budget tight? This book will work for you, too, as it introduces new and innovative concepts to solving traditionally expensive problems without a large technology budget. Using open source and proprietary examples, you will be engaged in best practice design methodologies for building new sites, as well as appropriately scaling both growing and shrinking sites. Website development help has arrived in the form of Scalable Internet Architecture.

Amazon Link

 

From the Back Cover

As a developer, you are aware of the increasing concern amongst developers and site architects that websites be able to handle the vast number of visitors that flood the Internet on a daily basis. Scalable Internet Architecture addresses these concerns by teaching you both good and bad design methodologies for building new sites and how to scale existing websites to robust, high-availability websites. Primarily example-based, the book discusses major topics in web architectural design, presenting existing solutions and how they work. Technology budget tight? This book will work for you, too, as it introduces new and innovative concepts to solving traditionally expensive problems without a large technology budget. Using open source and proprietary examples, you will be engaged in best practice design methodologies for building new sites, as well as appropriately scaling both growing and shrinking sites. Website development help has arrived in the form of Scalable Internet Architecture.

Popularity: 16%

Book: Building Scalable Web Sites

Wednesday, August 1st, 2007

Building, scaling, and optimizing the next generation of web applications

By Cal Henderson

Learn the tricks of the trade so you can build and architect applications that scale quickly–without all the high-priced headaches and service-level agreements associated with Building Scalable Web Sites: Building, scaling, and optimizing the next generation of web applicationsenterprise app servers and proprietary programming and database products. Culled from the experience of the Flickr.com lead developer, Building Scalable Web Sites offers techniques for creating fast sites that your visitors will find a pleasure to use.
Creating popular sites requires much more than fast hardware with lots of memory and hard drive space. It requires thinking about how to grow over time, how to make the same resources accessible to audiences with different expectations, and how to have a team of developers work on a site without creating new problems for visitors and for each other.
Presenting information to visitors from all over the world
* Integrating email with your web applications
* Planning hardware purchases and hosting options to have as much as you need without breaking your wallet
* Partitioning and distributing databases to support large datasets and simultaneous transactions
* Monitoring your applications to find and clear bottlenecks
* Providing services APIs and using services from other providers to increase your site’s reach and capabilities
Whether you’re starting a small web site with hopes of growing big or you already have a large system that needs maintenance, you’ll find Building Scalable Web Sites to be a library of ideas for making things work.

 

Buy Now

Product Details

  • Amazon Sales Rank: #6159 in Books
  • Brand: O’Reilly Media
  • Published on: 2006-05-16
  • Format: Illustrated
  • Number of items: 1
  • Binding: Paperback
  • 352 pages

Popularity: 10%