I’ve finished filling my word-frequency database using the Google Web API (see my previous entry on building a word-frequency list).
Of 27,693 English words queried for their page-count, here are the top 50:
Following should be added:
The data gathered is based on the Google page-count for each word. The page-count does not give higher rating to multiple occurrences of a word within the page. Which means that a single “copyright” within a text would count just the same as 100 “the"-words. (On the other hand, it’s quite natural that if on average “the” occurs 100 times, it will also occur at least 1 time in most shorter pages, which would increase its page-count.)
You can also download the word-hits list as ASCII text file (CSV file, to import into Excel, a database, or the like):
chriSEO looks like a great site. It mostly covers SEO and Google. Only thing is, I can’t find an RSS/XML feed.
“Internet search leader Google has rejected a takeover bid from Microsoft
in favour of selling its shares directly to the public, The New York Times
--DPA, Google rebuffs Microsoft, November 3, 2003
>> More posts