Posts

Showing posts from December 12, 2005

Fun writing a search engine

Introduction My interesting project for this quarter was writing a search engine to index blog entries. The experience to do something like this without knowing anything about resources required would probably be risky and stupid. But since this was just an educational project to undertand search technology and to learn java, capacity planning was last thing on my mind. Based on the resource I had, it was pretty clear to me that I can't build another yahoo or google. Besides who needs another one of those anyway. Also when I started working on this project, google hadn't released thier blog search engine. Needless to say indexing blogs looked pretty interesting. How difficult could it be to build another Technorati anyway ? A few servers running a crawler and a few database servers is all one needs with a nice front end written in pretty php. Crawling If search engine was all just about searching a text from a database, then it would be called a database. To build a search eng