January 16, 2011

Its logical - IAAS users will move to PAAS

Sysadmins love infrastructure control, and I have to say that there was a time when root access gave me a high. It wasn't  until I moved to web operations team (and gave up my root access) that I realized that I was  more productive when I wasn't dealing with day to day hardware and OS issues. After managing my own EC2/Rackspace instance for my blog for a few years , I came to another realization today that IAAS (infrastructure as a service) might be one of these fads which will give way to PAAS (Platform as a service).

Wordpress is an excellent blogging platform, and I manage multiple instances of it for my blogs (and one for my  wife's blog). I chose to run my own wordpress instance because I loved the same control which I used to have when I was a sysadmin. I not only wanted to run my own plugins, configured my own features, play with different kinds of caching features, I also wanted to choose my own linux distribution (Ubuntu ofcourse) and make it work the way I always wanted my servers to work.  But when it came to patching the OS, taking backups, updating wordpress and the zillion other plugins, I found it a little distracting, slightly frustrating and extremely time consuming.

Last week I moved one of my personal blogs to blogger.com and its possible that it may not be the last one. Whats important here is not that I picked blogger.com over wordpress.com, but the fact that I'm ready to give up control to be more productive. Amazon's AWS started off as the first IAAS service provider, but today they provide a whole lot of other managed services like Elastic MapReduce, Amazon Route 53, Amazon cloudfront and Amazon Relational Database Service which are more of a PAAS than IAAS.

IAAS is a very powerful tool in the hands of professional systems admin. But I'm willing to bet that over the next few years lesser number organizations would be worried about kernel versions and linux distributions and would instead be happy with a simple API to upload ".war" files (if they are running tomcat for example) into some kind of cloud managed tomcat instances (like how hadoop runs in elastic mapreduce). Google App Engine (Java and Python) and Heroku (Ruby based, Salesforce bought them) are two examples of such service today and I'll be surprised if  AWS doesn't launch something  (or buy someone out) within the next year to do the same.

Are you ready for IPv6 yet ?

If I say internet is running out of IPs, you might respond with "so whats new?". Whether you like it or not, this time its for real. While IPv4/8 blocks might be gone by the end of this year, it doesn't mean IPv6 trasition needs to happen right away. Fortunately, unlike the Y2K problem, we have a lot of tools and means to make this transition less painful by making it happen over an extended period of time.

Most of the larger organizations have been testing IPv6 for years. And thanks to Apple, Microsoft, linux developers and other industry leaders , the latest versions of the most popular operating systems come preconfigured to work with IPv6.

Whats missing, unfortunately, is the human element of this transition. Training the core network operators on IPv6 related issues isn't enough. Nor is it enough for all the softwares to support it. Every developer, engineer and users on all the 7 layers of the OSI stack has to understand it well enough to be able to troubleshoot real life problems just like how they deal with IPv4. Setting up wifi routers, for example, was a challenge for most end users... asking them to transition IPv6 related issues, in my opinion, would be as challenging, if not more.

Similarly online service operators have to figure out if they have all the tools available to provide realiable service over this new protocol. Traffic monitoring, Routing, Dashboarding, Security auditing, etc are some of the roles and responsibilities which may require retooling/retraining, and unless there is a significant amount of traffic, some of the operational issues may never show up. On June 8th 2011 few major organizations (including Google, Yahoo , Akamai and Limelight Networks) are organizing a "World IPv6 day". The Goal is that these organizations will run IPv6 version of their services for a full 24 hour period to observe how well they hold up.

Google is one of the very few companies which already has a few services running on IPv6 since 2008. Here are some more details from Google's blog
Google has been supporting IPv6 since early 2008, when we first began offering search over IPv6. Since then we’ve brought IPv6 support to YouTube and have been helping ISPs enableGoogle over IPv6 by default for their users.

On World IPv6 Day, we’ll be taking the next big step. Together with major web companies such as Facebook and Yahoo!, we will enable IPv6 on our main websites for 24 hours. This is a crucial phase in the transition, because while IPv6 is widely deployed in many networks, it’s never been used at such a large scale before. We hope that by working together with a common focus, we can help the industry prepare for the new protocol, find and resolve any unexpected issues, and pave the way for global deployment.

The good news is that Internet users don’t need to do anything special to prepare for World IPv6 Day. Our current measurements suggest that the vast majority (99.95%) of users will be unaffected. However, in rare cases, users may experience connectivity problems, often due to misconfigured or misbehaving home network devices. Over the coming months we will be working with application developers, operating system vendors and network device manufacturers to further minimize the impact and provide testing tools and advice for users.
Are you ready for IPv6 ?

Splunk : Fastest way to get web operations dashboard running

This is a cross-post from my personal blog.

Few weeks ago I asked a question on quora about log aggregation. I was surprised to find that no opensource solution came close to what I wanted, but I got a lot of suggessions to try out splunk. So I did.

What I wanted was an aggregation tool which collects, displays and alerts based on events logged by the various webservers across the network which could be in different datacenters. The organization where I set this up was generating about 300mb of production haproxy logs per day and something around 200mb of non-prod logs. Here is why splunk fit very well in this organization.

1) Log aggregation across multiple servers/datacenters- The organization had already solved this problem by piping haproxy logs using syslog-ng. They used a little bit of filtering to discard logs which are not interesting for splunk. Syslog-ng can be configured to use tcp instead of udp to make log delivery reliable. Splunk is capable of working as remote agents as well... but sending raw logs to it might increase the licensing costs.
2) Realtime dashboard - Splunk is a memory and cpu hog, but for smaller amount of logs, true realtime dashboard works beautifully. Even with multiple syslog-ng and splunk servers involved in the log fow, I was able to see realtime graphical dashboards updated within 5 to 10 seconds of the actual requests. Thats pretty impressive and may not too useful for high volume websites. Generating realtime dashboards which don't update automatically is a more realistic use of splunks resource, and this again works pretty well as long as too many people are not trying to use it at the same time.
3) Querying/Filtering/Analyzing - Splunk's querying language is very different from SQL but there are cheatsheets available to help you create queries. This querying language is very powerful and is perhaps the toughest part of the learning curve. The results from these queries can be sent to web dashboards or to alerting agents which can trigger emails/pages based on pre-defined conditions.
4) Its important to note that splunk is not just for http logs. So it has to be trained to generate reports you would like. Unlike something like awstats you would have to write your own queries and dashboards (which are in XML). There is extensive documentation available, and the support guys were very helpful when I called. On the other hand if all you wanted was awstats like dashboard you could just used google analytics.
5) Free/Commercial versions - While the free version can do most of the stuff there are some key enterprise features for which I'll recommend buying the commercial version. Authentication, LDAP integration, Alerting features, Federation, etc are some of the features which are missing in free edition. Oh, and phone support.

I'm still not convinced that splunk is scalable.. the biggest issue with splunk is that the cost of maintaining splunk goes up with amount of logs generated per day. Hardware costs, and licensing costs at some point will cross the cost of developing/architecting/setting_up something like hadoop/flume/hive/opentsdb/etc in your own network. But unless you are a big shop, it might be a good idea to postpone that discussion until u really need to do it.