Latest Publications

“Chrome instant” feature could break your webapp

The “Google instant” wasn’t a ground breaking idea by itself. We have all been using various forms of imageauto-completes for a while now. What makes it stand out is that unlike all the previous kinds of auto-completes, this one is able to search the entire web archive, at an amazing speed and still be able to serve personalized, hyper-local results.  You can get more information about its backend here and here.

It wasn’t surprising that Google even put this feature inside chrome itself. Take a look at this demo from lifehacker. This is where it gets interesting…

 

At the beginning this looked very exciting. I was pleasantly surprised when chrome brought up websites, in addition to auto-completing URLs,  as I typed. The impact on the servers didn’t sink in until I was debugging a bug in my own application which required me to take a look at the apache logs. Look at the following log snippet from apache. Not surprisingly, I found 17 calls instead of just 1 made to my web application while I was typing the URL. All of this happened in 6 seconds, which is about the time it took me to type the URL.

[29/Sep/2010:02:39:04 -0700] "GET /cfmap/create.jsp?p HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:04 -0700] "GET /cfmap/create.jsp?po HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:04 -0700] "GET /cfmap/create.jsp?por HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:05 -0700] "GET /cfmap/create.jsp?port HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:05 -0700] "GET /cfmap/create.jsp?port= HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:05 -0700] "GET /cfmap/create.jsp?port=1 HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:05 -0700] "GET /cfmap/create.jsp?port=1 HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:08 -0700] "GET /cfmap/create.jsp?port=1& HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:08 -0700] "GET /cfmap/create.jsp?port=1&a HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:08 -0700] "GET /cfmap/create.jsp?port=1&ap HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:08 -0700] "GET /cfmap/create.jsp?port=1&app HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:09 -0700] "GET /cfmap/create.jsp?port=1&appn HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:09 -0700] "GET /cfmap/create.jsp?port=1&appna HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:09 -0700] "GET /cfmap/create.jsp?port=1&appnam HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:09 -0700] "GET /cfmap/create.jsp?port=1&appname HTTP/1.1" 200 60 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:09 -0700] "GET /cfmap/create.jsp?port=1&appname= HTTP/1.1" 200 60 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:10 -0700] "GET /cfmap/create.jsp?port=1&appname=34 HTTP/1.1" 200 60 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0
[29/Sep/2010:02:39:10 -0700] "GET /cfmap/create.jsp?port=1&appname=34 HTTP/1.1" 200 60 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.17 Safari/534.7" ::  847 0

There are two issues here which made me very concerned

  1. Volume of requests: This is a no brainer. The example I used above is not a normal use case since we don’t expect users to type URLs every time they use web-applications. But if the app has an easy to use API which can be used by users in this way, the impact of that small percentage of users who use will get magnified many folds very quickly. It may get very important to figure out how to queue requests, and also important to figure out how to distinguish between users who are spamming the website with 10 requests per second from the user who makes 1 request. All this problem could also go away if your app can actually handle 5 to 20 times more traffic already, which is probably the best solution.
  2. Robust APIs: This is a more tricky one which developers need to plan for. Lets say there was an API like this “/api/transfermoney.php?from=account1&to=account2&amount=10000”. How much money will this API transfer if you type this url in a browser which auto-executes partial URLs ?

What broke the camels back was the fact this particular feature was often flagged by Google’s own search engine as being spammy/automated.  It got so bad that I had to switch to firefox to do a simple google search.  image

And here is an example of how my Google history is now polluted with things I didn’t really search for. In this example I was looking for “ohdoctah” after I heard about it on twit. The key here is that while Google might have thought about how to mine this polluted search data, other web applications might find this impossible to deal with without significant addition in resources. 

image

For now I’ve disabled the feature in the browser. I hope that either there is an easy solution to this problem, otherwise I don’t see this feature making it into the production version of Chrome soon.

Operations Dashboards – KaChing

I’ve mentioned this before, but like to do it again because I think these guys are awesome. If you have not listened to devopscafe’s podcasts, this might be the right time to take a look at it. Here is a video of one of their sessions with folks at KaChing who have been doing amazing stuff around continuous deployments.

Continuous Deployment and Operations Dashboards at kaChing from dev2ops.org on Vimeo.

MongoDB is webscale (humor)

Humor is not what this website is about, but sometimes it doesn’t matter how the message is wrapped to get it across some brains. I’m a big NoSQL fan, but I also understand where some of the specific implementations are weak. I have nothing against MongoDB, but this is just too funny not to share.

Scalability updates for Aug 27th 2010

My updates have been slow recently due to other things I’m involved in. If you need more updates around what I’m reading, please feel free to follow me on twitter or buzz.

Here are some of the big ones I have mentioned on my twitter/buzz feeds.

Continuous deployments may not be for everyone: Culture

If you have read this blog before, you know how much I admire those who use continuous deployments in production. Doing that at scale is even more impressive. But the message which gets lost sometimes is that Continuous deployments may not be for everyone.

Most continuous integration environments usually do all of their deployments from trunk. Which means every check-in has to be production quality. Digg’s Andrew Bayer gives a good explanation of how they do code reviews and pre-code check-ins before code is merged into trunk.

Site uptime and reliability depends on a comprehensive QA process to protect against unintentional mistakes. And for rapid deployments one has to abandon manual QA processes in favor of 100% automated testing with the goal of getting close to 100% code coverage. Thats hard if the code is not written in a way which can be tested easily.

image

But, unit and integration tests alone cannot guarantee quality. In addition to testing code which has been implemented in the application, there needs to be tests to look for things which shouldn’t be implemented. For example, it would be nice to have tests to look for non-parameterized SQL calls in parts of code where it shouldn’t exist. If you know there is a wrong way to do something, write a test case for it so that its caught as soon as someone does it.

Some of this would be easy to do if you already follow a test driven development process where you have to write tests before you write code.

The biggest difference between an organization which follows Continuous deployment and one which doesn’t is in how QA is done. QA becomes a shared responsibility where everyone has to contribute. No matter how many tools or guidelines one publishes, if teams using this process don’t believe in it, the quality and availability of website will suffer. Pascal-Louis Perez (from KaChing) used a diagram like the one here to explain how this “culture” is at the heart of continuous deployment.

“Culture” also explains why most of the older organizations who follow a more traditional form of deployment are having a hard time understanding and adapting to this process.

Are you using Continuous deployments in your environment ? What was your biggest hurdle ?

TCP and the Lower Bound of web performance

One of the less discussed, but highly informative and very thought provoking talk during Velocity 2010 was the one about TCP, latency, window sizes and its relation to web performance. The slides to this talk by “John Rauser” can be found here. And thanks to Mike Bailey, there is a video recording as well.

Follow the slides as you watch the video to understand the talk.

TCP and the Lower Bound of Web Performance – John Rauser from Goodfordogs on Vimeo.

All Velocity conference 2010 Slides/Notes

Here are all the slides/PDFs which I’ve come across from the first 2 days at velocity, please let me know if I missed any.

 

    • Slides

    Speeding up 3rd party widgets using ASWIFT

    This is a summary of the talk by Arvind Jain, Michael Kleber from Google at velocityconf about how to write widgets using same domain iframe using document.write. Speed improvements of over 90% in loading widgets with this change.

    • Web is slow
      • Avg page load time 4.9s
      • 44 resources, 7 dns requests, 320kb
      • Lot of 3rd party widgets
        • digg/facebook/etc
    • Measurements of 3rd party widgets
      • Digg widget
        • 9 HTTP requests, 52 kB
        • scripts block the main page from downloading
        • stylesheets blocks the main page from rendering in IE
      • Adsense takes up  12.8% page load time
      • Analytics takes up < 5%   ( move to async widget )
      • Doubleclick takes up 11%
    • How to make Google AdSense “fast by default”
      • Goals / Challenges
        • Minimize blocking the publishers page
        • Show the ad right where the code is inserted
        • Must run in publishers Domain
      • Solution (ASWIFT) – Asynchronous Script Written into IFrame Tag
        • Make show_ads.js a tiny loader script
        • Loader creates a same-domain iframe (using document.write)
        • Loads the rest of the show_ads into the iframe by document.write() of a <script> tag
        • This loading of iframe is asynchronous.
      • Browser specific surprises
        • Problems with parallel downloads of same script in IE
        • Iframe creation inside <head> in Firefox has a problem
        • Requesting headers in Chrome was buggy
        • Forward-Back-Reload behavior is buggy (refetching instead of using cache)
        • document.domain vs friendly iframes

    Urs Holzle from google on “Speed Matters”

    From Urs’ talk at the velocity2010 conference [ More info : Google, datacenterknowledge ]

    • Average web page – 320kb, 44 resources, 7 dns lookups, doesn’t compress 3rd of its content
    • Aiming for 100ms page load times for chrome
    • Chrome: HTML5, V8 JS engine, DNS prefetching, VP8 codec, opensource, spurs competition
    • TCP improvements
      • Fast start (higher initial congestion window)
      • Quick loss recovery (lower retransmit timeouts)
      • Makes Google products 12% faster
      • No handshake delay (app payload in SYN packets)  [ Didn’t know this was possible !!! ]
    • DNS improvements
      • Propagate client IP in DNS requests (to allow servers to better map users to the closest servers)
    • SSL improvements
      • False start (reduce 1 round trip from handshake)
        • 10% faster (for Android implementation)
      • Snap start (zero round trip handshakes, resumes)
      • OCSP stapling (avoid inline roundtrips)
    • HTTP improvements (SPDY):
      • Header compression
      • Stream multiplexing and prioritization
      • Server push/hints
      • 25% faster
    • Test done
      • Download the same “top 25” pages via HTTP and SPDY, network simulates a 2Mbps DSL link, 0% packet loss – Number of packets dropped by 40%
      • On low bandwidth links, headers are surprisingly costly. Can add 1 second of latency.
    • Public DNS:
      • reduces recursive resolve time by continuously refreshing cache
      • Increases availability through adequate provisioning
    • Broadband pilot testing going on
      • Fix the “last mile” complaint
      • Huge increase of 100x
    • More developer tools by Google
      • Page speed, speed tracer, closure compiler, Auto spriter
    • More awareness about performance

    James Hamilton: Data center infrastructure innovation

    Summary from James’ keynote talk at Velocity 2010 James Hamilton

    • Pace of Innovation – Datacenter pace of innovation is increasing.  The high focus on infrastructure innovation is driving down the cost, increasing reliability and reducing resource consumption which ultimate drives down cost.
    • Where does the money go ?
      • 54% on servers, 8% on networking, 21% on power distribution, 13% on power, 5% on other infrastructure requirements
      • 34% costs related to power
      • Cost of power is trending up
    • Clouds efficiency – server utilization in our industry is around 10 to 15% range
      • Avoid holes in the infrastructure use
      • Break jobs into smaller chunks, queue them where ever possible
    • Power distribution – 11 to 12% lost in distribution
      • Rules to minimize power distribution losses
        • Oversell power – setup more servers than power available. 100% of servers never required in a regular datacenter.
        • Avoid voltage conversions
        • Increase efficiency of conversions
        • High voltage as close to load as possible
        • Size voltage regulators to load and use efficient parts
        • High voltage direct current a small potential gain
    • Mechanical Systems – One of the biggest saving is in cooling
      • What parts are involved ? – Cooling tower, heat exchanges, pumps, evaporators, compressors, condensers, pumps… and so on.
      • Efficiency of these systems and power required to get this done depends on the difference in the desired temperature and the current room temperature
      • Separate hot and cold isles… insulate them (don’t break the fire codes)
      • Increase the operating temperature of servers
        • Most are between 61 and 84
        • Telco standard is 104F (Game consoles are even higher)
    • Temperature
      • Limiting factors to high temp operation
        • Higher fan power trade-off
        • More semiconductor leakage current
        • Possible negative failure rate impact
      • Avoid direct expansion cooling entirely
        • Air side economization 
        • Higher data center temperature
        • Evaporative cooling
      • Requires filtration
        • Particulate and chemical pollution
    • Networking gear
      • Current networks are over-subscribed
        • Forces workload placement restrictions
        • Goal: all points in datacenter equidistant.
      • Mainframe model goes commodity
        • Competition at each layer rather than vertical integration
      • Openflow: open S/W platform
        • Distributed control plane to central control