June 29, 2006

Disaster Recovery process: Insurance policy for IT disasters

In a bizarre twist of reality, a company which was standing one day, is packing up and folding away three days later. Couchsurfing faced, what they called, a perfect storm which could have happened to anyone. My sympathies with them and especially their IT team who must have gone through a lot before they were all asked to leave. Multiple failures happening at the same time is not so rare as your IT team make you believe. It has happened and will happen for ever. Unfortunately its disasters like these that make people realize the importance of backup procedures and disaster recovery plans. It reminds me of September 11, 2001 and Katarina (New Orleans) which in its own weird ways, contributed a lot towards IT Disaster recovery process improvements. IT's backup and disaster recovery team were some of the unsung heros who never seem to get recogonized for how they help business to get back into action after a disaster on this scale. Investing in backup processes, is like an insurance policy with a never-ending monthly bill, but help you back on your feet if disaster strikes.

June 28, 2006

Google checkout and SSO

Google checkout is out, and as expected its so lean and mean that I couldn't figure out if it was actually a new google component. With froogle already in place, Google checkout can cash on the goodwill people have for its froogle service. I think this news is a big one for other business organizations, but probably isn't as significant for end user.

Remember Microsoft Passport ? Now think Google Single Sign on. I noticed a story about it being released and pulled yesterday due to some unkown reason. Personally I've always supported Federated authentication system, because it can reduce security problems due to reduced number of passwords one needs to remember. However, using a 3rd party single signon over which we have no control is like the government trying to control/monitor our income. That being said I'm still ready to subject myself to Google's Single sign on if it reduces security risks.

June 25, 2006

OpenLaszlo Legals: Breaking the flash barrier

In the past, though I loved the idea behind laszlo, it was hard for me to come up with a reason to force my users to use Flash. That was before Ajax gained popularity. With RIA (Rich internet applications) invading the market, I had been, for a few months, pondering about re-investigating laszlo to see where it stands.

Today, however, I got a very pleasent surprise when OpenLaszlo announced the availability of "OpenLaszlo Legals" extention which allows OpenLaszlo to generate runtimes for different target browsers using JScript, ActionScript or Javascript instead of just Flash.



I can see Laszlo getting a lot of positive feedback over the next few days. This is probably the best move they could have made. I wish them all the best.

Notes: WikiMapia, Digg, IPv6, flock and Google Sync.

WikiMapia

  • This is the first time I happen to stumble upon WikiMapia, which looks like a wiki of maps. Very interesting and creative idea. WikiMapia uses Google Maps API and allows users to mark places and add text to locations around the world.

  • Its like  a large world map with people scribling all over it. Google recently updated its global map database to include some very high resolutions satallite images around the world which makes WikiMapia an even more very interesting new service to look out for.


Digg

  • Digg has been around for just over a year and has already surpassed slashdot in traffic volume. The Digg 3.0 release party demoed some really interesting new tools which are set to come out soon after 3.0 release on monday. The one tool which already exists is Digg Spy.


IPv6

  • US Government has plans to enable IPv6 on backbone routers by 2008.

  • Comcast is probably the first large organization who has already started deploying IPv6. Here are some interesting presentation slides from one of their talks.

  • I looked up ARIN and noticed that Google, Microsoft and Cisco all have /32 assigned to them which is a significant allotment. Even though ARIN policy kind-of states that /32 allotments requires the aquiree to act as an ISP and give away atleast 200 blocks to smaller ISPs or organizations in 5 years, I don't think this is enforced. Cisco for example has its IPv6 block since 2000 and is well past its 5 year limit.

  • Aparently, during IPv6 I also found out that while IPv6 is being deployed, multihoming is not yet standardized.


Flock

  • If you like Firefox you'll like Flock too. Just like the web is slowing moving towards web 2.0, flock is kind of an extention to the firefox experience which gives you "web 2.0 rich" experience.

  • Features like social tagging, blogging and photo sharing are built into the browser. But what I liked the best in flock is its implementation of the RSS new reader.

  • Flock beta 1 was released on June 13th.


Google Sync

  • Google Sync is a firefox plugin which claims to synchronize your browser settings with your gmail account so that you can carry them with you when you switch desktops.

  • Unfortunately though flock is based off firefox, its not supported which is a shame cause I primarily use flock. However, there is a hacked version of Google Sync which will work for flock here.

  • BTW, I think that Google Sync is far from mature, 'cause over the weekend Google Sync successfully locked up my Firefox browser on windows XP and even reboot doesn't bring it up anymore.

June 24, 2006

Top Ten ways to speed up your website

Over last few years as a web admin, I realized that knowing HTML and javascript alone was not enough to build a fast website. To make the site faster one needs to understand the real world problems like network latency and packet loss which is usually ignored by most web administrators. Here are 10 things you should investigate before you call your website perfect. Some of these are minor configuration changes, others might require time and resource to implement.

  1. HTTP Keepalives: If Http Keepalives are not turned on, you can get 30% to 50% improvements just by turning this on. Keepalives allow multiple HTTP requests to go over the same TCP/IP connection. Since there is a performance penalty for setting up new TCP/IP connections, using Keepalives will help most websites.

  2. Compression: Enabling compression can dramatically speed up sites which transfer large web objects. Compression doesn't help much on a site with lots of images, but it can do wonders in most text/html based websites. Almost all webservers which do compression automatically detect browsers compatibility before they compress data in HTTP. Most browsers since 1999 which support HTTP 1.1 support compression too by default. In real life, however, I've noticed some plugins can create problems. An excellent example is Adobe's PDF plugin which inconsistently failed to open some PDFs on our website when compression was enabled. In apache its easy to define which objects should not be compressed, so setting up workarounds are simple too.

  3. Number of Objects: Reduce the number of objects per page. Most browsers can't download more than 2 objects at a time (RFC 2616). This may not seem like a big deal, but if you are managing a website which has international audience, network latency can dramatically slow down the load-time for the page. The other day I checked on google's search page and noticed that they had only one image file in addition to the html page. That's an amazingly lean website. In real life all sites can't be like that, but using image maps with javascript to simulate buttons can do wonders. Merging HTML, Javascripts and CSS into a single file are other common ways of reducing objects. Most modern sites today avoid using images entirely for buttons and stick made of HTML/CSS/Javascript.

  4. Multiple Servers: If you can't reduce the number of objects try to distribute your content over multiple servers. Since most browsers have an upper-limit on the number of open connections to a single server, they may ignore that limit if some objects are from different server. For example what would happen if an HTML page which has 4 jpeg images is using server1.domain.com and server2.domain.com for 2 images each instead of putting all of them on one server ? In most browsers cases you will notice 2 times speed improvement. Firefox and IE browsers can both be modified to increase this limit, but you can't ask each of your visitors to do that.

  5. AJAX: Using AJAX won't always speed up your website, but having javascript respond to users click immediately can make it feel very responsive. Most interactive sites are using AJAX technologies today than they were before. In some cases, sites using Java and Flash have moved to AJAX to do the same work in lesser number of bytes.

  6. Caching: Enabling expiry HTTP header on objects can intelligently tell browsers to cache the objects for a predefined duration. If your site doesn't change very often, or if there are a certain set of pages or objects which change less frequently, change the expiry header associated with that file type to mention that. Browsers visiting your site should see speed improvements almost immediately. I've seen sites with more than 50 image objects in a single HTML file doing amazingly well due to browser caching.

  7. Static Objects on fast webserver: Web applications servers are almost always proxied through a webserver. While web application servers can do a good job of providing dynamic content, they are not the best suited to service static objects. In most cases you can see significant speed improvements if you offload static content to the webserver which can do the same job more efficiently. Adding more application servers behind a loadbalancer can do the same trick too. While at the topic, please remember the language you chose to serve your application can make or break your business. While protoyping can be done in almost any language, heavily used websites should investigate performance, productivity and security gain/loss of moving to other platforms/languages like Java/.Net/C/C++.

  8. TCP/IP initial window size: The default initial TCP/IP Window sizes on most operating systems are conservatively defined and can affect download/upload speed problems. TCP/IP starts with a low window size and tries to find an optimal window size over time. Unfortunately since the initial value is set to a low value and since HTTP connections don't last that long, setting the initial value to a higher value can dramatically speed up transmission to remote high latency networks.

  9. Global Loadbalancing: If you have already invested in some kind of simple loadbalancing technology and are still having performance problems, start investigating in global loadbalancing which allows you to deploy multiple servers around the world and use intelligent loadbalancing devices to route client traffic to closest web server. If your organization can't afford to setup multiple websites around the world, investigate global caching services like Akamai

  10. Webserver Log Analysis: Make it a habit to analyse your webserver logs on a regular basis to look for errors and bottlenecks. You would be surprised how much you can learn about your own site by looking at your logs. One of the first things I look for are objects which are requested the most or objects which consume the most bandwidth. Compression and Expiry can both help in this case. I regularly look for 404s and 500s to see for missing pages or application errors. Understanding where your customers are coming from (country) and what times they like to come in at can help you understand latency or packet loss problems. I use awstats for my log analysis.


References:

[p.s: This site royans.net unfortunately is not physically maintained by me, so I have limited control to make changes on it.]

June 19, 2006

Why GoogleTalk is not about Instant Messaging.

The two big names in the messaging industry came out with two major upgrades today. Yahoo announced "Yahoo Messenger 8.0" for Windows platform and MSN released their Windows Live Messenger. While both MSN and Yahoo are offering some form of VoIP support, the big thing for Yahoo was the opening up of the APIs for its messenger and the discussion happening is around its Yahoo! Messenger On-the-Road offering which seems to be some kind of a paid service which will grant you access to more than 30000 wifi spots around the world. On MSN side the big thing is the announcement that Philips is now making Voip handsets with embedded Windows Live Messenger in it. This trend of moving VoIP software to handheld devices is not new, but with Microsoft jumping into the market, it not very surprising why Skype is giving away free minutes.

Which brings this discussion to the third player in this market, Google. While MSN and yahoo are desperately trying attach the kitchen sink to their IM client, Google seems to be less interested in developing standalone "Google Talk" clients and is more interested in gathering generating grass root support with least bottlenecks for the end user. For coming late to the party, thats not too much to ask for.

However what we all miss to see in this picture is that in the IM world, MSN and Yahoo are not very far from what centralized networks like AOL and Compuserv looked like before they hooked up to the internet. Isn't it a shame that you as a user of MSN also have to create a Yahoo, GoogleTalk, ICQ and AOL account just to talk to all of your friends ? And while you can sign up with just one ISP to visit all the websites on the internet is it really necessary to sign up with 10 different service providers just to exchange instant messages with your friends ? After all how different is instant messages from regular email messages ?

When Google decided to use an open protocol called Jabber which has close to 100 different client implementations, they did two things which was not very apparent outright. First they bought themselves a huge developer base which have been screaming about Jabber as an alternative to proprietary protocols. Second they have now forced MSN and Yahoo to acknowledge that inter-IM communication is eventually possible.
Infact, Jabber protocol, unlike other instant messaging protocols was designed ground up like SMTP protocol to be decentralized, flexible and diverse. Its so much alike like SMTP, that from a birds eye view Jabber could look like SMTP in the way it works.

GoogleTalk in short is what Internet was to AOL the reason why Google doesn't care about GoogleTalk client is because Jabber like SMTP can be routed, archived and searched for targeted advertisements.

//p.s. In the current design GoogleTalk is not routable(s2s)... but that hopefully would be fixed soon.

June 17, 2006

Sun AMD V20z hardware problems

Sun Microsystems was one of the first big companies to come up with 64Bit AMD V20Z servers which quickly replaced our ancient Sparc servers. Compared to the old E220s and E420s, AMD servers were about 3 to 5 times faster depending on what we wanted it to do.

The first round of V20z's we deployed saved us a lot of rack space, but the heating and power requirements were little higher than expected. Though the v20z's did reduce the footprint on the racks, the heat generated forced us to leave room on the top of the servers where the ventilation holes were placed. For all practical reasons, we couldn't use it as one U system.

We ordered a second round of V20Z's a few months back and though we were prepared for the extra rack space, we stumbled upon a whole new problem this time. We noticed that some of these servers were randomly rebooting, especially at times of high activity. We were using a mirror image of the Suse distribution which we installed on the first set of servers which rules out any change in the software/os side. Whats funny is that some of these servers were so predictable faulty that a simple "tar -xvzf filename.tgz" would kill it. Putting the boot drive from the faulty server in a perfectly working server confirmed that it wasn't the OS or Harddisk which was faulty, but the server hardware itself.

These problems have been going on for over atleast a couple of months and we have opened up a case with sun for few weeks now. Among the things we have done to fix this includes updating different firmwares in various V20z components, play around with the memory modules add more space for ventilation and we even checked the voltage regulator to see if its defective. These servers are brand new and of the 30 or so which we bought we can consistently reproduce this problem on 6 of them. Infact we had the sun engineer (2 of them) come on site and see it for themselves and yet its hard for them to agree that they need to replace the server.

So the question is, how long does it take for someone to admit a mistake and give us a replacement ? Does Sun realize that while they request us to upgrade firmwares on our servers and do other time delaying steps, 20% of these servers can't be used at all ? Do they understand that if we just wanted to keep them unused, we would probably not have bought it in the first place ?

Our company has tried to escalate this problem with Sun so many times, and the guy on the other end just refuses to sign off on the replacements.

Which leads me to the next question, how many other servers are there which have this problem ? If you have this problem, could you please reply to this blog, or let me know by email ? If 20% of the servers sold to us were badly defective, there has to be others out there who are having the same problem.

We have spent between 300 to 600 man hours trying to debug this problem and setting up  workarounds instead of resolve this issue. Posting of this blog online is not just an act of desperation on my part, but is also a message for Sun Microsystems to let them know that they are not the only server vendor out there.