Posts

Showing posts from June, 2010

All Velocity conference 2010 Slides/Notes

Here are all the slides/PDFs which I’ve come across from the first 2 days at velocity, please let me know if I missed any. Links,  PDF and Docs Apache traffic server – HTTP proxy server on the edge Dont let third parties slow you down Keeping tracking of your performance using slow show Mobile web high performance Progressive enhancements: Tools and techniques Removing the human SPOF Scalable internet architectures Data center infrastructure innovation Closure Compiler: Speeding web applications by compiling javascript Ignite: Apache libcloud Videos - http://www.youtube.com/user/OreillyMedia   Slides Velocity2010 View more presentations from Tim O’Reilly . Common Sense Performance Indicators in the Cloud View more presentations from Nick Gerner . Mobile Web High Performance View more presentations from Maximiliano Firtma

Speeding up 3rd party widgets using ASWIFT

This is a summary of the talk by Arvind Jain, Michael Kleber from Google at velocityconf about how to write widgets using same domain iframe using document.write. Speed improvements of over 90% in loading widgets with this change. Web is slow Avg page load time 4.9s 44 resources, 7 dns requests, 320kb Lot of 3rd party widgets digg/facebook/etc Measurements of 3rd party widgets Digg widget 9 HTTP requests, 52 kB scripts block the main page from downloading stylesheets blocks the main page from rendering in IE Adsense takes up  12.8% page load time Analytics takes up < 5%   ( move to async widget ) Doubleclick takes up 11% How to make Google AdSense “fast by default” Goals / Challenges Minimize blocking the publishers page Show the ad right where the code is inserted Must run in publishers Domain Solution (ASWIFT) -

Urs Holzle from google on “Speed Matters”

Image
From Urs’ talk at the velocity2010 conference [ More info : Google , datacenterknowledge ] Average web page - 320kb, 44 resources, 7 dns lookups, doesn’t compress 3rd of its content Aiming for 100ms page load times for chrome Chrome: HTML5, V8 JS engine, DNS prefetching, VP8 codec, opensource, spurs competition TCP improvements Fast start (higher initial congestion window) Quick loss recovery (lower retransmit timeouts) Makes Google products 12% faster No handshake delay (app payload in SYN packets)  [ Didn’t know this was possible !!! ] DNS improvements Propagate client IP in DNS requests (to allow servers to better map users to the closest servers) SSL improvements False start (reduce 1 round trip from handshake) 10% faster (for Android implementation) Snap start (zero round trip handshakes, resumes) OCSP stapling (avoid i

James Hamilton: Data center infrastructure innovation

Image
Summary from James’ keynote talk at Velocity 2010 Pace of Innovation – Datacenter pace of innovation is increasing.  The high focus on infrastructure innovation is driving down the cost, increasing reliability and reducing resource consumption which ultimate drives down cost. Where does the money go ? 54% on servers, 8% on networking, 21% on power distribution, 13% on power, 5% on other infrastructure requirements 34% costs related to power Cost of power is trending up Clouds efficiency – server utilization in our industry is around 10 to 15% range Avoid holes in the infrastructure use Break jobs into smaller chunks, queue them where ever possible Power distribution – 11 to 12% lost in distribution Rules to minimize power distribution losses Oversell power – setup more servers than power available. 100% of servers never required in a regular datacenter.

Web performance Metrics 101

This talk by Sean and Alistair is one of the talks I couldn’t attend today due to conflicts, but I’m glad the slides are already up. Performance measurement is often the starting point for most web applications and that can’t be done without understanding what goes on between the browser and the server. Metrics 101 View more presentations from Alistair Croll .

Thoughts on scalable web operations

Image
Interesting observations/thoughts on  web operations collected from a few sessions at Velocity conference 2010 [ most are from a talk by Theo Schlossnagle , author of “Scalable internet architectures” ] Optimization Don’t over optimize. Could take away precious resources away from critical functions.  Don’t scale early. Planning for more than 10 times the load you currently have or are planning to support might be counter-productive in most cases. RDBMS is fine until you really need something which can’t fit on 2 or 3 servers. Optimize performance on single node before you optimize and re-architect a solution for horizontal scalability. Tools Tools are what a master craftsman makes… tools don’t make a craftsman a master. Tools can never solve a problem, its correct use does. Master the tools which need to be (could be ) used in production at short notice. Looking for man page for these tools durin

Pingdom: Software behind facebook

Image
Pingdom has an interesting post which lists the various components which runs facebook. “ Exploring the software behind Facebook, the world’s largest site ” Few interesting statistics listed Facebook serves 570 billion page views per month (according to Google Ad Planner). There are more photos on Facebook than all other photo sites combined (including sites like Flickr). More than 3 billion photos are uploaded every month. Facebook’s systems serve 1.2 million photos per second . This doesn’t include the images served by Facebook’s CDN. More than 25 billion pieces of content (status updates, comments, etc) are shared every month. Facebook has more than 30,000 servers (and this number is from last year!) I’m not sure facebook is really the “largest site” based on servers alone, but its definitely the largest based on unique users in US.

Slides from a Cassandra talk at Mountain View

Introduction to Cassandra (June 2010) View more presentations from gdusbabek . Whats not mentioned in the slide was Gary’s reference to the number of key changes in 0.7 version of Cassandra. He thinks beta would be out in a month and that it will address a lot of issues which is currently keeping a lot of Cassandra users away. Few interesting points 0.5, 0.6 use the same version of SSTABLE (to store data on disk), but 0.7 changes that. This will require some kind of migration if 0.7 doesn’t support reading old versions of SSTABLE. until now, one needs 50% disk space available (free) to do compaction operation. This might improve with 0.7 0.7 would probably have more support for avro (instead of thrift). He wonders why thrift hasn’t caught on Vector clocks coming.. altering keyspace and column families is not possible on a live system today… might change with future version Compression is being thought about… He strongly urged users to use

How to extract biggest text block from an HTML page ?

One of the interesting problems in handling html content is trying to auto-detect biggest html block from the center of the page. This can be very useful for on-the-fly content analysis done on the browser. Here is an example of how it could be done by parsing the dom after page is rendered.   // Royans K Tharakan (2010 June) // http://www.royans.net/ // You are free in any form to use as long as you give credit where its due // Would appretiate if you submit your changes/improvement back to me or to some other public forum. // Requires jquery var largestId = 0; var largestDiv = null; var largestSize = -1; function getLargestDiv() {     var size = getSize(document.getElementsByTagName("body")[0], 0);     if (window.location.href.indexOf("wikipedia.org")>0){         return "#bodyContent";     }     return "[d_id='tmp_" + largestId+"']"; } function getSize(currentElement, depth) {

Distributed systems and Unique IDs: Snowflake

Most of us who deal with traditional databases take auto-increments for granted. While auto-increments are simple on consistent clusters, it can become a challenge in a cluster of independent nodes which don’t use the same source for the unique-ids. Even bigger challenge is to do it in such a way so that they are roughly in sequence. While this may be an old problem, I realized the importance of such a sequence only after using Cassandra in my own environment. Twitter, which has been using Cassandra in many interesting ways has proposed a solution for it which they are releasing as open source today. Here are some interesting sections from their post announcing “ Snowflake ”. The Problem We currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters. For various reasons, the details of which merit a whole blog post, weâ€