Google Gloats: Search Engine Index Hits A Trillion Pages

Dennis Faas's picture

There are now more than a trillion web pages...and those are just the ones Google knows about. Google's index of web pages has grown from 26 million in 1998 to a billion (one thousand million) in 2000 and now to a trillion (one million million) in 2008.

The search engine firm announced the statistic as a way of publicizing its position as the market leader (ahem, gloating). Writing in a post on the company's blog, software engineers Jesse Alpert said that the Google system reprocesses the entire index several times a day, each of which is equivalent to checking every road intersection in a country 50,000 times the size of the United States. (Source: blogspot.com)

There are actually more than a trillion individual page addresses. However, Google filters out duplicate pages. It also doesn't count certain types of links such as web calendars pointing to 'tomorrow' which would lead to an infinite number of pages.

Several people have pointed out that these stats, while eye-catching, are meaningless. Google admits it doesn't index all the sites it knows about, partly because many are pure spam, and partly because many are just too obscure for it to be worth the effort.

Indeed, TechCrunch.com's Michael Barrington reports estimates that Google only actually looks at the contents of around 40 billion pages. There's also the argument that it doesn't matter how many sites you index as long as you can deliver accurate results that answer people's questions.

Google's sudden interest in the size debate may have something to do with Microsoft announcing a new approach to searching. Google's success is largely due to PageRank, a formula based on how many sites link to a particular page, and how popular those sites are – meaning a link from CNN is far more valuable than a link from unknown blogger.

In response, Microsoft has now introduced BrowseRank, which takes the PageRank system but adds in how long people spend on a particular page. The idea is that people spend more time on genuinely useful pages. Microsoft gives the example of Adobe's website: millions of pages link to its download page for the Acrobat Reader software, boosting it's ranking on Google, but few people then remain there, clicking around and exploring the site. (Source: newsfactor.com)

It's an interesting point, though that particular example is a bit odd: after all, while the Adobe site may not be a cracking read, it's certainly the best place to look if you do want to download Acrobat Reader.

Rate this article: 
No votes yet