November 22, 1998

When cache means cash

When Gordon Moore first coined the famous ``Moor's Law'' stating that processor power will double every eighteen months, only a few believed him. Interestingly the law is still valid in 1998. However if I now tell you that Internet is doubling every four months, you would again disbelieve me. But this is a fact, which everyone in the Internet industry has to live with. And if you plan to jump into this wild ISP industry you better get your dirty work of planning done before you start.

The nineties have been a decade of the HTTP and WWW. Starting with the first browser at CERN in 1991, the protocol has been particularly instrumental in the success of Internet around the world. And as this phenomenon grows at over 10 times a year, the only thing which limits it, is the infrastructure itself on which it runs on. VSNL itself, which has been in Indian ISP scene for around three years, has dramatically grown from a small bandwidth of a few 64KBPS links, to more than 125mbps today, and have plans to buy a couple of hundreds MBPS more in the coming months.

VSNL's realization of heavy usage of microsoft.com site by users in India resulted in their setting up a microsoft mirror site right here in India. But of course VSNL can't afford to setup a mirror site for all popular web-sites on the Internet.

A study on the Internet shows that about 30 to 70 percent of Internet traffic generated from a group of homogeneous set of client population could be redundant. For example all of my friends, and I'm sure most of you too would have search engines like altavista.digital.com or yahoo.com as the ``Default Home Page'' on your browser. And every time your browser opens up, the site pops up on your screen. Wouldn't it be great to have an ISP which is intelligent enough to remember the often downloaded sites on the Internet so that it needn't retrieve it every-time from that server in US. This is what ``Internet Caching'', or the ``Caching Technology'' is all about.

The first form of Caching started with the Internet browsers which started keeping a copy of site in a cache directory on the client itself for faster loading of pages which are repeatedly accessed. This was interesting achievement in itself, which improved a bit more with the second generation of cache softwares.

The Internet ``Proxy'' servers as we know today, were initially designed to proxy a group of clients hiding behind it, so that a single Internet connections could be used to browse by a group of clients. But the slow Internet connectivity forced the creators of these proxy servers to maintain a second level cache of all the browsers behind it. However unlike the simpler client end caching, these cache engines were much more enhanced in their ability to predict user preferences and periodically flushes out unused pages to optimize cache performance.

However it were the programmers and not scientists who designed these proxy cache engines. The work on ``cache'' module for these products were too early to be called an achievement. But as people around the world started analyzing httpd log files from web servers and routers, it was soon taken up as a mathematical challenge by adventurous scientists all over the world to predict traffic from a particular segment of homogeneous population. Starting with CERN httpd cache and later with ``HARVEST project'' the mathematical concepts started taking practical shape. Today there are more than 30 odd caching products available today for use of which many are customized version of SQUID, which was actually derived from the HARVEST project.

While Netscape and Microsoft sell their own version of Software cache engines, Companies like Network appliances, CISCO and Skycache have taken this a step further with hardware products based on the same which are ready to use out of the box appliances to do heavy duty traffic caching.

However, when we talk about ISPs, we have to separate the boys with men. Most of the Caching products are designed with small-scale networks or at most large enterprises in mind. Enforcing rules and mechanism for optimal use of cache can be enforced in these environments. But when one talk about ISPs providing services to consumers, they may not exactly be in a position to individually make people adhere to standards or policies, which might be essential for implementation of the cache services.

Both Microsoft and Netscape caches softwares which are designed for Enterprise in mind have one of the biggest drawbacks of forcing the user to manually set a proxy server address on the browser. To some this is not a problem as it's a onetime setting which needs to be set by the client, however the new ISP users in India would have enough at hand to learn already. On the other hand products from CISCO, inktomi and cobaltnet works so transparently for the end user that in most cases the clients may never know of its existence at all.

Products from CISCO , Cacheflow, Packetstorm and Cobaltnet are specialized hardware box or appliances which are optimized for caching. With a high speed storage device and heavy duty networking performance these appliances can actually be placed in front of the router to filter all traffic passing through a router. For even better performance you may like to prefer CISCO or Cacheflow over others because of their independence from an "OS" to run on. When an application talks directly with the hardware performance increase is dramatic.

If you already have surplus hardware in your organization and and if you are not willing to spend big money over a hardware you are not sure off, then you may look at using squid or other products from inktomi, digital or sun. Though you may not get performance as good as a cache appliance, one big benefit of software based cache engine is that you would easily be able upgrade your hardware resources without being forced to buy the whole hardware all over again if you ever need to upgrade.

To select a cache product, ISPs would need to make some basic technical decisions which would not only have a long lasting impact on the how they would later administer and maintain network but may also affect their business practice a bit. But the bottom like for all decisions is the financial implication, which it would have.

The default mechanism of implementing a cache box is by using ``http proxy'' protocol, which is the same as that used in Proxy software. However as I said before all ISP's may not like to ask their users to manually configure proxy server address on the browsers. But if this option is let to the users, those who do use this mechanism may find network access performance to improve dramatically. The other mechanism of doing the same is using ``Transparent caching'' mechanism which does caching without the user making any configuration at the client side. This of course means that the user does not have an option of switching over to a non-caching mode. Most of the hardware based cache boxes do support this and so do some of the software caches. The caches automatically detect http request in the traffic and transparently checks for the pages in its storage and passes back the page if found else retrieves the pages for the client. A major drawback of this mechanism is that failure of this box can result in total halt of all WWW related services.

For bigger ISP's the question lingering in their minds would be about how they should go about implementing multiple cache engines on multiple gateways working in tandem together. It would be a simple wastage of resources if all the caches in your organization keep multiple copies of the same documents with it. Depending on what cache engine you are using most of the them have protocols which allow interaction between multiple caches to share information and caches in some form. Going a step ahead one could also look for protocols which allow sharing of cached information between different cache boxes in case one is looking for sharing using cache boxes installed by the upstream ISP. A phenomenon fast spreading on the Internet among the ISPs is this concept of cache sharing an example of which is IRCACHE (http://ircache.nlanr.net). But the prerequisite for this kind of caching is the ability to speak popular tongue of the other caches. ICP (Internet Cache Protocol (v2) is the latest and probably the most supported protocol around. Most of the SQUID based softwares rely on this protocol to request pages (aka objects) from other caches. CARP, which is now supported by Microsoft Proxy (v2), is another protocol, which might catch on soon. CISCO primarily uses WCCP which is proprietary as of today. However CISCO can talk to squid and other caches using ICP also. If an ISP is looking for such a inter-ISP agreement, then its best for them to check the protocol support in the cache boxes.

Now that I mentioned CISCO I would like to impress upon you that though CISCO Cache engine is a bit new to the industry it has one neat feature which others don't have yet. CISCO's WCCP protocol is built right into 7000 series of CISCO router, and would soon be available on 3600s and 4500s. Presence of this protocol in routers helps the CISCO routers to transparently re-route traffic to CISCO Cache engines unlike other boxes which requires the boxes to be present between the client and the router. Apart from this feature of rerouting, the CISCO routers also allow you to transparently shut down caching mechanism or reroute it to other caches in case cache fails. WCCP protocol has been recently released for commercial products, but it might take a little while before CISCO WCCP is supported by other cache vendors

The final piece of information which you may like to research on, would be the User interface and the algorithm a cache engine uses to detect redundant data (hit rate). A small public domain SQUID cache running over Linux on a Pentium may be good enough for a small ISP of 50 to 100 simultaneous users. But if technical manpower is costly and R&D; is not something you would like to invest on, you may look at the commercial products which I mentioned before in the article.

Caching technology also its set of problems. The most prominent of which is that of possibility that the data retrieved by the browser might be old. Adding to this problem is the fact that web-sites don't properly implement usage of expiry-stamps on the Web Pages. Another problem is concerning copyright laws which prohibit keeping copy of information in print or digital form. There also have been cases where organization have sued ISPs for accidentally blocking out their organization from the Internet which resulted in loss of revenue. The final and probably the most frightening problem of caching is, the ability of the caching software to keep copy of confidential information (including information passing through SSL) sent between the client and servers, which can be easily compromised by people who have access to it.

But at the end all that matters is how much attractive these caches are financially over the normal connections. Looking at infrastructure in India I'd see these products becoming very popular very soon.

22nd November 1998

August 28, 1998

Voice over IP VOIP: Dream or a Reality

Date 28th August 1998
Voice over IP (VOIP) has been talked about more than SMTP and POP. The proposition which makes this talk so redundant is the fact that many a comparative studies have shown the call charges to fall to over 1/50th of the cost as of today. But there is more to it. This article would cover various issues including the technical, legal and ethical issues behind implementation of this highly talked about technology on Internet and particularly in India.

Technology

Voice over data network is not something new, as Voice over Frame Relay has been very successful for sometime now. However what keeps IP a step ahead, is its ability to address the issues of a global protocol and global accessibility over public data network. Moreover for Intranets also IP has certainly proved to be a better protocol than all the others for WAN networks worldwide.

Unlike the Plain Old Telephone System (POTS), which is a switched network, routing voice over IP has the ability to address some network redundancy and efficiency issues which POTS can’t. IP is built over self learning routing protocols which can actively route traffic over multiple routes to the same destinations in the event of network failure on a particular route. But the more important issue, which IP solves, is the data compression, which occurs at the source router. This allows almost 8 to 15 times more voice channels on the same bandwidth for the same quality. More over the kind of error correction involved in the routers or VOIP equipments at the end point can easily take upto 25% packet loss without much degradation in service.

However the impression that you need to have a multimedia system on every computer to use this technology is wrong. There have been successfully implementations of VOIP, particularly in Singapore and Australia between exchanges, which allows one to continue using POTS equipment at the end points. You may also, alternatively, use your computer’s multimedia to connect to POTS equipment at the other end.

Still VOIP has some issues, which makes it a bit more than far-fetched dream for now. The most important issue that needs to be addressed is the issue of QoS (Quality of Service) possible on the IP infrastructure. IP has a major drawback in its ability to facilitate prioritized traffic. Voice which is a real time traffic needs to get across from one terminal to other within a specific duration of time, after which the data is of no use. This requires the routers in between to recognize this traffic as high priority traffic and allow them through without making them stand in queue. Also if this traffic can’t be let through within a specific time, the routers could be advised to drop the packets altogether as its of no use any way. Though there is a built-in facility of indicating priority within IPv4 using the type of service and precedence bits in the IPv4 packet. However this requires that the initiating application to insert these bits which give about no reason why a certain traffic should not be given high precedence. Which in other words makes the facility redundant. Because of this reason most of the routing technology and applications ignore these bits.

The other way of attacking this problem was by designing a mechanism in which only the Service providers would be provided with the ability to prioritize the packets. RSVP (Resource Reservation protocol) is one such effort by the IETF (Internet Engineering Task Force) to address this issue. The protocol allows high priority channels (similar to virtual circuits in packet switched network) to be created between multiple routers to allow high priority traffic flow. RSVP which is now supposed to be supported by many a routers still faces lots of performance issues which is why companies like CISCO and IBM are going ahead with their own variation of the QoS or Differential services which can do that same job.

It is because of this very important reason along with a few more that most of the VOIP vendors are concentrating more on corporate networks rather than Internet. A Company like Micom, which is heavily VOIP related technology, admits whole heatedly that IP is fine for Voice, but Internet is Not. In other words Internet may not be the only medium to carry IP traffic. Many corporate which have their own high speed Intranets feel its more economical to switch to high efficiency private Intranet IP leased lines than using long distance calls using their local telco. Moreover with the provision to attach many PBXs with routers, its more of a reality which already exists in many companies around the world. Micom it self has sold over ten thousand pieces of VOIP equipments till now.

A reasonable argument suggests that taking everything into considerations the IP industry good wishers would soon be seeing voice traffic over IP in about 1/10 of the normal cost for the same performance.

VSNL and VOIP

For most of the people in India Voice over IP is nothing more than voice transmission going over Internet using their home Multimedia system. There are different companies providing the same service and web-site of most of them have been blocked, thanks to the VSNL ideology. During a interesting debate over this technology at Internet World with Mr.Amitabh Kumar, he admitted that VSNL has been blocking these sites as the products they provide are illegal to use in India. Further it has been informed that closing on a particular port use on the VSNL network has technologically blocked the products they provide. Work around for that exists but that is another story. Among the more interesting details talked about during the debate was the interesting issue whether VSNL has the power/authority of blocking even Web or Email traffic to these sites which is one of the most important resource points for budding engineers in the IP sector. And still more interesting is the fact that though VSNL has been blocking these site for the age old 18th century telecom regulation it has not even taken an iota of step towards blocking even more problematic pornographic sites like playboy, which is also illegal to view under Indian Law. According to its blocking of these sites which would show VSNL in a better light than blocking of Technologically advance sites like Vocaltec which are in fact VSNL’s competitors in a way. If money is the only thing VSNL sees then it may be right, but if VSNL is talking about legal issues then it better complete its job it started or its intentions would be doubted.

However VSNL admits that there is no limitation on off line voice over IP which means that you are not doing any illegal transmission when you send mail with voice attachments.

Moreover VSNL’s current announcement of it having more than 70Mbps is a welcome note, but is far unrealistic if one is thinking of running VOIP applications on it. Hopefully soon we would be seeing multiple Internet backbones (including the DOT national Internet backbone) all connected to each other around the country creating a thick backbone sufficient enough for a more realistic infrastructure for voice calls.

IT Task force has been voicing its concern over this age old law to be struck down with a more flexible self correcting law to enable this highly dynamic industry to make use of everything its got. Allowing VOIP is just another form of liberalization that will happen. If not today, it would tomorrow.

There are some people who are thinking of public interest law suite, to stop VSNL from blocking all IP traffic to VOIP providers, as VSNL does not have power to block or limit use of WEB or Email access to them. I hope issues like these don’t reach that extent, as VSNL has been very cooperative in the past and hopefully would be in the future. All I can say right now is that my IP enabled phone may be a few years late to reach my table top, till then POTS zindabad.