June 23, 2010

All Velocity conference 2010 Slides/Notes

Here are all the slides/PDFs which I’ve come across from the first 2 days at velocity, please let me know if I missed any.

 

    • Slides

    Speeding up 3rd party widgets using ASWIFT

    This is a summary of the talk by Arvind Jain, Michael Kleber from Google at velocityconf about how to write widgets using same domain iframe using document.write. Speed improvements of over 90% in loading widgets with this change.

    • Web is slow
      • Avg page load time 4.9s
      • 44 resources, 7 dns requests, 320kb
      • Lot of 3rd party widgets
        • digg/facebook/etc
    • Measurements of 3rd party widgets
      • Digg widget
        • 9 HTTP requests, 52 kB
        • scripts block the main page from downloading
        • stylesheets blocks the main page from rendering in IE
      • Adsense takes up  12.8% page load time
      • Analytics takes up < 5%   ( move to async widget )
      • Doubleclick takes up 11%
    • How to make Google AdSense “fast by default”
      • Goals / Challenges
        • Minimize blocking the publishers page
        • Show the ad right where the code is inserted
        • Must run in publishers Domain
      • Solution (ASWIFT) - Asynchronous Script Written into IFrame Tag
        • Make show_ads.js a tiny loader script
        • Loader creates a same-domain iframe (using document.write)
        • Loads the rest of the show_ads into the iframe by document.write() of a <script> tag
        • This loading of iframe is asynchronous.
      • Browser specific surprises
        • Problems with parallel downloads of same script in IE
        • Iframe creation inside <head> in Firefox has a problem
        • Requesting headers in Chrome was buggy
        • Forward-Back-Reload behavior is buggy (refetching instead of using cache)
        • document.domain vs friendly iframes

    Urs Holzle from google on “Speed Matters”

    From Urs’ talk at the velocity2010 conference [ More info : Google, datacenterknowledge ]

    • Average web page - 320kb, 44 resources, 7 dns lookups, doesn’t compress 3rd of its content
    • Aiming for 100ms page load times for chrome
    • Chrome: HTML5, V8 JS engine, DNS prefetching, VP8 codec, opensource, spurs competition
    • TCP improvements
      • Fast start (higher initial congestion window)
      • Quick loss recovery (lower retransmit timeouts)
      • Makes Google products 12% faster
      • No handshake delay (app payload in SYN packets)  [ Didn’t know this was possible !!! ]
    • DNS improvements
      • Propagate client IP in DNS requests (to allow servers to better map users to the closest servers)
    • SSL improvements
      • False start (reduce 1 round trip from handshake)
        • 10% faster (for Android implementation)
      • Snap start (zero round trip handshakes, resumes)
      • OCSP stapling (avoid inline roundtrips)
    • HTTP improvements (SPDY):
      • Header compression
      • Stream multiplexing and prioritization
      • Server push/hints
      • 25% faster
    • Test done
      • Download the same “top 25” pages via HTTP and SPDY, network simulates a 2Mbps DSL link, 0% packet loss - Number of packets dropped by 40%
      • On low bandwidth links, headers are surprisingly costly. Can add 1 second of latency.
    • Public DNS:
      • reduces recursive resolve time by continuously refreshing cache
      • Increases availability through adequate provisioning
    • Broadband pilot testing going on
      • Fix the “last mile” complaint
      • Huge increase of 100x
    • More developer tools by Google
      • Page speed, speed tracer, closure compiler, Auto spriter
    • More awareness about performance

    James Hamilton: Data center infrastructure innovation

    Summary from James’ keynote talk at Velocity 2010 James Hamilton

    • Pace of Innovation – Datacenter pace of innovation is increasing.  The high focus on infrastructure innovation is driving down the cost, increasing reliability and reducing resource consumption which ultimate drives down cost.
    • Where does the money go ?
      • 54% on servers, 8% on networking, 21% on power distribution, 13% on power, 5% on other infrastructure requirements
      • 34% costs related to power
      • Cost of power is trending up
    • Clouds efficiency – server utilization in our industry is around 10 to 15% range
      • Avoid holes in the infrastructure use
      • Break jobs into smaller chunks, queue them where ever possible
    • Power distribution – 11 to 12% lost in distribution
      • Rules to minimize power distribution losses
        • Oversell power – setup more servers than power available. 100% of servers never required in a regular datacenter.
        • Avoid voltage conversions
        • Increase efficiency of conversions
        • High voltage as close to load as possible
        • Size voltage regulators to load and use efficient parts
        • High voltage direct current a small potential gain
    • Mechanical Systems – One of the biggest saving is in cooling
      • What parts are involved ? - Cooling tower, heat exchanges, pumps, evaporators, compressors, condensers, pumps… and so on.
      • Efficiency of these systems and power required to get this done depends on the difference in the desired temperature and the current room temperature
      • Separate hot and cold isles… insulate them (don’t break the fire codes)
      • Increase the operating temperature of servers
        • Most are between 61 and 84
        • Telco standard is 104F (Game consoles are even higher)
    • Temperature
      • Limiting factors to high temp operation
        • Higher fan power trade-off
        • More semiconductor leakage current
        • Possible negative failure rate impact
      • Avoid direct expansion cooling entirely
        • Air side economization 
        • Higher data center temperature
        • Evaporative cooling
      • Requires filtration
        • Particulate and chemical pollution
    • Networking gear
      • Current networks are over-subscribed
        • Forces workload placement restrictions
        • Goal: all points in datacenter equidistant.
      • Mainframe model goes commodity
        • Competition at each layer rather than vertical integration
      • Openflow: open S/W platform
        • Distributed control plane to central control

    June 22, 2010

    Web performance Metrics 101

    This talk by Sean and Alistair is one of the talks I couldn’t attend today due to conflicts, but I’m glad the slides are already up.

    Performance measurement is often the starting point for most web applications and that can’t be done without understanding what goes on between the browser and the server.

    Thoughts on scalable web operations

    Interesting observations/thoughts on  web operations collected from a few sessions at Velocity conference 2010 [ most are from a talk by Theo Schlossnagle, author of “Scalable internet architectures” ]

    • Optimization O'Reilly Radar Logo
      • Don’t over optimize. Could take away precious resources away from critical functions. 
      • Don’t scale early. Planning for more than 10 times the load you currently have or are planning to support might be counter-productive in most cases. RDBMS is fine until you really need something which can’t fit on 2 or 3 servers.
      • Optimize performance on single node before you optimize and re-architect a solution for horizontal scalability.
    • Tools
      • Tools are what a master craftsman makes… tools don’t make a craftsman a master.
      • Tools can never solve a problem, its correct use does.
      • Master the tools which need to be (could be ) used in production at short notice. Looking for man page for these tools during an outage isn’t ideal.
    • Cookies
      • Use cookies to store data wherever possible.
      • Sign them if you are concerned about tampering
      • Encrypt them if you are concerned about users having visibility into it
      • Its cheaper to use user’s browser as a datastore replication node, than build redundant servers
    • Datastores
      • NoSQL is not the solution for everything [ example: so long MongoDB ]
      • Ditto RDBMS
      • Ditto everything else
      • Get the requirements, understand the problem and then pick the solution. Instead of the other way around.
    • Automation
      • When you find yourself doing something more than 2 times, write scripts to automate it
      • When users report failures before monitoring systems do, write better monitoring tools.
    • Revision control
      • Revision control as much as possible.
      • Provides audit trail to help understand what happened before. One can’t remember everything. Excellent place to search during hard to solve production problems.
    • Networking
      • Think in packets and not bytes to save load time.
      • There is no point in compressing a CSS file which is 400 bytes since the smallest data IP packet will store is about 1300 bytes (rest of the packet is padded with empty bytes if the data being sent is smaller).
      • In fact compression and decompression will take away precious CPU resources on server and the client.
      • Instead think of embedding short CSS files in HTML to save a few extra packets.
    • Caching
      • Static objects
        • Cache all static objects for ever
        • Add random numbers/strings to objects to force a reload of the object.
          • For example instead of requesting “/images/myphoto.jpg” request “/images/myphoto.123245.jpg”
          • Remove the random ID using something like an htaccess rewrite rule
        • Use CDNs where ever possible, but make sure you understand all the objects part of your page before you shove the problem to a CDN. pointless redirects can steal away previous loading time.
    • People
      • When you hire someone for operations team, never hire someone who can’t remember a single production issue he/she was caused. People learn the most from mistakes, so recognizing people who have been on the hot seat and have fixed their mistakes.
      • Allow people to take risks in production and watch them how they recover from it. Taking risk is part of adapting to new ideas, and letting them fail helps them understand how to improve.
  • Systems
      • Know your systems baseline. An instant/snapshot view of a system’s current statistics is never sufficient to fully classify a systems current state. ( for example is 10 load average abnormal on server XYZ ?)
      • Use tools which periodically poll and archive data to help you give this information
    • Moderation
      • Moderate the tools and process you use
      • Moderate the moderation

    What did I miss ? :) Let me know and I’ll add it here…

  • June 19, 2010

    Pingdom: Software behind facebook

    Pingdom has an interesting post which lists the various components which runs facebook. “Exploring the software behind Facebook, the world’s largest site”Facebook

    Few interesting statistics listed

      • Facebook serves 570 billion page views per month (according to Google Ad Planner).
      • There are more photos on Facebook than all other photo sites combined (including sites like Flickr).
      • More than 3 billion photos are uploaded every month.
      • Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images served by Facebook’s CDN.
      • More than 25 billion pieces of content (status updates, comments, etc) are shared every month.
      • Facebook has more than 30,000 servers (and this number is from last year!)

    I’m not sure facebook is really the “largest site” based on servers alone, but its definitely the largest based on unique users in US.

    Slides from a Cassandra talk at Mountain View

    Whats not mentioned in the slide was Gary’s reference to the number of key changes in 0.7 version of Cassandra. He thinks beta would be out in a month and that it will address a lot of issues which is currently keeping a lot of Cassandra users away. Few interesting points

    • 0.5, 0.6 use the same version of SSTABLE (to store data on disk), but 0.7 changes that. This will require some kind of migration if 0.7 doesn’t support reading old versions of SSTABLE.
    • until now, one needs 50% disk space available (free) to do compaction operation. This might improve with 0.7
    • 0.7 would probably have more support for avro (instead of thrift). He wonders why thrift hasn’t caught on
    • Vector clocks coming..
    • altering keyspace and column families is not possible on a live system today… might change with future version
    • Compression is being thought about…

    He strongly urged users to use client libraries which abstract out the internals of Cassandra’s internal workings. It was convincing enough for me to investigate a move from cassandra’s java lib, to “hector” for my java application.

    June 02, 2010

    How to extract biggest text block from an HTML page ?

    One of the interesting problems in handling html content is trying to auto-detect biggest html block from the center of the page. This can be very useful for on-the-fly content analysis done on the browser. Here is an example of how it could be done by parsing the dom after page is rendered.

     

    // Royans K Tharakan (2010 June)
    // http://www.royans.net/
    // You are free in any form to use as long as you give credit where its due
    // Would appretiate if you submit your changes/improvement back to me or to some other public forum.
    // Requires jquery

    var largestId = 0;
    var largestDiv = null;
    var largestSize = -1;

    function getLargestDiv() {
        var size = getSize(document.getElementsByTagName("body")[0], 0);
        if (window.location.href.indexOf("wikipedia.org")>0){
            return "#bodyContent";
        }
        return "[d_id='tmp_" + largestId+"']";
    }

    function getSize(currentElement, depth) {

        var basesize = 0;
        var actualsize = 0;

        if (currentElement.innerHTML) {
            basesize = currentElement.innerHTML.length;
        }

        if (currentElement.tagName) {
            actualsize = basesize + currentElement.tagName.length * 2 + 5;
        } else {
            actualsize = basesize;
        }

        var attributes = currentElement.attributes;
        if (attributes != null) {
            for ( var j = 0; j < attributes.length; j++) {
                actualsize = actualsize + (attributes[j].name.length);
                actualsize = actualsize + (attributes[j].value.length);
                actualsize = actualsize + 4;
            }
        }

        if (currentElement.childNodes) {
            var i = 0;
            var currentElementChild = currentElement.childNodes[i++];
            while (currentElementChild) {
                var innersize = getSize(currentElementChild, depth + 1);
                if (currentElementChild.innerHTML) {
                    basesize = basesize - innersize
                            - currentElementChild.tagName.length * 2 - 5;
                } else {
                    basesize = basesize - innersize;
                }
                currentElementChild = currentElement.childNodes[i++];
            }
        }

        if ((largestDiv == null) || (basesize > largestSize)) {
            if ((currentElement.tagName == 'DIV')
                    || (currentElement.tagName == 'SPAN')
                    || (currentElement.tagName == 'OL')
                    || (currentElement.tagName == 'LI')
                    || (currentElement.tagName == 'P')
                    || (currentElement.tagName == 'A')) {
                largestDiv = currentElement;
                largestSize = basesize;
                largestId++;
                currentElement.setAttribute("d_id", "tmp_" + largestId);
            }
        }
        if ((currentElement.tagName == 'SPAN')
                || (currentElement.tagName == 'OL')
                || (currentElement.tagName == 'LI')
                || (currentElement.tagName == 'P')
                || (currentElement.tagName == 'A')
                || ((currentElement.tagName == 'DIV') && (currentElement.childNodes.length == 0))) {
            return (actualsize - basesize);
        }

        return actualsize;
    }

    June 01, 2010

    Distributed systems and Unique IDs: Snowflake

    Most of us who deal with traditional databases take auto-increments for granted. While auto-increments are simple on consistent clusters, it can become a challenge in a cluster of independent nodes which don’t use the same source for the unique-ids. Even bigger challenge is to do it in such a way so that they are roughly in sequence.

    While this may be an old problem, I realized the importance of such a sequence only after using Cassandra in my own environment. Twitter, which has been using Cassandra in many interesting ways has proposed a solution for it which they are releasing as open source today.

    Here are some interesting sections from their post announcing “Snowflake”.

    The Problem

    We currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters. For various reasons, the details of which merit a whole blog post, we’re working to replace many of these systems with the Cassandra distributed database or horizontally sharded MySQL (using gizzard).

    Unlike MySQL, Cassandra has no built-in way of generating unique ids – nor should it, since at the scale where Cassandra becomes interesting, it would be difficult to provide a one-size-fits-all solution for ids. Same goes for sharded MySQL.

    Our requirements for this system were pretty simple, yet demanding:

    We needed something that could generate tens of thousands of ids per second in a highly available manner. This naturally led us to choose an uncoordinated approach.

    These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets.[1]

    Additionally, these numbers have to fit into 64 bits. We’ve been through the painful process of growing the number of bits used to store tweet ids before. It’s unsurprisingly hard to do when you have over 100,000 different codebases involved.

     

    Solution

    To generate the roughly-sorted 64 bit ids in an uncoordinated manner, we settled on a composition of: timestamp, worker number and sequence number.

    Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper (though that’s overridable via a config file).

    We encourage you to peruse and play with the code: you’ll find it on github. Please remember, however, that it is currently alpha-quality software that we aren’t yet running in production and is very likely to change.