Archive for the ‘solr’ Category

Crawling sucks.

I wrote my first crawler in a few lines of perl code to spider a website recursively about 10 years ago. Two years ago I wrote another crawler in a few thousand lines using java+php and mysql. But this time I wasn’t really interested in competing with google, and instead crawled feeds (rss/atom). Google hadn’t [...]

Read the rest of this entry »