December 04, 2004

Google's secert 301/302 bug

Introduction: I heard about this only today, but seems like this is one of the most secret bugs which google is being hit with right now. Whats interesting is that this has been going on for a while. I saw references to similar problems made in posts made in 2003.

Problem: If site A points to site B using meta-refresh/redirects in a certain way, google interprets it in such a way that site A has the same content as site B. Based on what I saw in different posts across the internet, site A doesn't need to have any replicated content hosted on it. It just needs a meta-refresh pointing to site B. This by itself is not the problem however, since the most popular site will still show up first on the google search pages. This becomes a problem if the redirect is initiated by a page which has a higher PR (Page Ranking) within google. So if site A somehow has higher PR, it could effectively hijack site B by abusing its PR ranking using this kind redirect to site B.

Analysis: So there are many ways of doing a redirect using HTTP return status.

Also, its possible to use "meta-redirects" within pages which can do a "refresh" to another page. "meta-redirects" is the equivalent of a 302 at the HTML layer. If this bug is for real, it must be within the page retrieval engine in google robot which "gets" the page for the robot. There are some applications and probably some perl modules which would automatically retrieve redirected pages even if the original request didnt specifically request the module to recursively request for the redirected object.

References: