Found some very interesting graphs of Internet connectivity as they grew between 1969 and 2007.
Source National Science Foundation
Found some very interesting graphs of Internet connectivity as they grew between 1969 and 2007.
Source National Science Foundation
If you haven’t noticed already there is a second blog which I maintain which is currently more busy than this particular blog. “Scalable web architectures” is a collection of posts about how web architectures which scale and technologies which make it happen.
Here are some of the posts on that blog
Powerset could have gone the way most dot-com companies have gone, but instead they decided to try out Amazon’s EC2 (Elastic Cloud Computing) and S3(Simple Storage Service) to augment their computational needs.
Youtube is said to be pushing about 25 petabytes per month which is about 77 Gbps sustained data rate on an average. The bandwidth usage at the peaks would be even higher. Thanks to Limelight networks, Youtube doesn’t really need to scale or provision for that kind of bandwidth and based on the some reports from 2006 it had cost them close to 4 million a month back then. Youtube and services like that have to invest a lot in their infrastructure before they can really launch their service and though using shared Content delivery networks is not ideal, its probably not a bad deal. In Youtube’s case, it helped them survive until Google bought it out.
Theo Schlossnagle, the author of “Scalable internet architecutres†argues that federation is form of partitioning, and that sharding is nothing but a form of partitioning and federation. Infact, according to him, Sharding has already been in use use for a long time.
Eins.de site serves about 1.2 million dynamic pages a day. He wrote a series of articles describing how they redesigned the site to scale for growth. I found these articles very informative with a extreemly mature discussion of the colorful world of scalability.
If I could only give one recommendation to anyone building a brand new web application, I’d say “go stateless“. But going stateless is not the same as going session-less. One could implement a perfectly stateless web architecture which still uses sessions to authenticate, authorize and track user activity. And to complicate matters further, when I say stateless, I really mean that the server should be stateless, not the client.
Loadbalancers, by definition, are supposed to solve performance bottlenecks by distributing or balancing load between different components its managing. Though you would normally find loadbalancers in front of a webserver, a lot of different individuals have found other interesting ways of using it.
The other day I briefly mentioned the pain point of the web2.0 world and how consolidation, aggregation and summarization will help reduce some of it.
Microsoft today formally announced the availability of Microsoft Live ID as a contender for the providing SSO (single sign on) services in the web 2.0 world. Live ID, incase you didnt know, is the repackaged version of Microsoft Passport Network, which had failed so badly that it forced Microsoft to pull it out of the market. Here are some examples of how to use other languages like php, perl, python, ruby etc to do authentication using Live ID. Microsoft is not the first one to openly come out with a SSO technology. Liberty Alliance and OpenID are other opensource competitors which have some foothold in this market already.
The move to SSO, in the web 2.0 world, (Single sign on) is bound to happen regardless of how scary some people might find it to be. If you can trust your online bank with 100000 dollars and trust 3 companies you don’t really know with your entire credit history, then this shouldn’t be that much of a concern. The real question is whether you trust the technology leaders Microsoft, Google, Yahoo or others like Verisign enough to provide these critical services for you.
In my opinion the reason why OpenID and Liberty Alliance have failed is because of fragmentation of standards and lack of leadership. While Microsoft failed the commercial venture into Authentication services (Microsoft Passport network) it might actually do well as long as it doesn’t screw up this time. Not because the they have done a great job in the past, but because the pain is now so unbearable that people are willing to give almost anything a try. But the real kicker is that almost everyone has a microsoft account anyway, so if I had an option to use my Microsoft account to login to a new web 2.0 product, I’ll do that in a heart beat. Creating yet another account with a new password and doing the email confirmation thing is not an adventure anymore… ( or may be I’m getting old ).
I predict that Google or Yahoo will soon jump into this with its own suite of authentication services (probably using OpenID or Liberty Alliance) which will then become the next battleground in the web2.0 world. I also predict that in a couple of years after that many of the web services will move towards supporting these forms of authentication services so that users are not forced to create new user accounts with new passwords every single time.
And if my predictions don’t really come true… hey, at least I know that I can dream.
References
I’ve been reading a lot about scalable web architectures lately and made a big enough collection of links to see that this could be interesting to others. Instead of putting all those links here in this blog, I’ve started a separate blog here http://www.royans.net/arch/. If you have an interesting link/links to share please send it over to me.
Scoble discusses a relatively new site called upcoming to track new upcoming events.
I’ve picked the best events from my friends and added them to my own profile there. If I can’t make an event, but think it’s a good one for you to consider I say “I’m watching.†You can see which events I’m attending as well. What you can’t see is that when you have a ton of friends that you’ve hand picked, like I have, whenever you sign into Upcoming.org it’ll show you new events that your friends have added that you should consider. Then you can see what those events are, and who is attending them.
The search, Web2.0, blogging, social networking and now online television.
There are two kinds of innovation happening on the internet today. The first kind are the ones which are redefining the internet, and the second kind build over the first kind. Unlike traditionally research and innovation, its not the idea but the implementation and execution which makes or breaks a production or service in todays world.
As a hobby for the last few years I’ve played around with quite a few ideas to understand the implementation and execution complications involved in bringing ideas to life. I wrote a internet feed crawler 2 years ago, created a personalized feed reader preference detection engine using bayesian algorithm, created a IP/networking debugging tool called huntip.net, a digg like social news publication site called zoppr… and the latest experiment I had with was a service called flagthis . In my other life (at my real job) I kicked off a search product based on lucene with a fullblown Ajax interface using GWT 3.0.
I’ll be on the lookout for the next interesting idea to implement… and experiment with. Let me know if you have something interesting to share.
Last night I went to an SDForum talk by two eBay architects Randy Shoup and Dan Pritchett on how they built, scaled and run their operation. The talk didn’t have anything substantially different from what I’ve heard before, but was still impressive because they were applying some of the common thinking to their operations which runs over 15000 servers any given time. [ Slides ]
Here are a few interesting phrases I took away from the talk.
The same origin policy prevents document or script loaded from one origin from getting or setting properties (XMLHttpRequest) of a document from a different origin. The policy dates from Netscape Navigator 2.0. This is a very important security restriction which disables rogue third-party javascripts from getting information from your authenticated banking server session.
Unfortunately, this also almost completely shuts down any possibility of data sharing between multiple servers. Note the use of the word “almost”, because “JSON” is the new Saviour of web2.0 world. JSON or Javascript Object Notation, is nothing but a simple data interchange format which can be easily used by javascript applications. Whats different here is that unlike XMLHttpRequest which can send back answers in any format the javascript application wants, JSON requires the answers to be in JSON format, which is basically a subset of Javascript Programming language, or to be more specific Standard ECMA-262.
For those who are curious how this works and don’t have time to read the complete documentation, the difference is that a javascript application can still call other javascripts to be loaded from third party websites. So if you are running an application on www.royans.net and you have some data on data.royans.net, you can load that data into your application as long as you masquerade that information as a javascript.
Thats it, there is no rocket science here… but it does feel like one when you first come across it. I surely did.
While you are at it, watch out for JSONP (JSON with padding) too. Google is one company which I know have been using such mechanisms for a long time. They recently came out with more vocal support of this new open data interchange standard.
Oh, and before you go hacking your code, one thing you might like to watch out is to avoid opening up private/privileged information using JSON mechanism, because its open to XSS (Cross site scripting hole).
There are tons of speedtesting tools out there. But here is one you might not have seen before. Its called speedtest.net. Whats cool about this site is that it allows you to test your bandwidth against multiple server in US and Europe instead of just one.