Google Blogoscoped

Monday, November 3, 2003

Most-Used Words Online

I’ve finished filling my word-frequency database using the Google Web API (see my previous entry on building a word-frequency list).
Of 27,693 English words queried for their page-count, here are the top 50:

  1. the
  2. of
  3. and
  4. to
  5. a
  6. in
  7. for
  8. on
  9. home
  10. is
  11. by
  12. all
  13. this
  14. with
  15. about
  16. or
  17. at
  18. from
  19. are
  20. us
  21. site
  22. information
  23. you
  24. contact
  25. an
  26. more
  27. new
  28. search
  29. that
  30. your
  31. it
  32. be
  33. as
  34. page
  35. other
  36. have
  37. web
  38. copyright
  39. not
  40. can
  41. our
  42. use
  43. news
  44. will
  45. privacy
  46. help
  47. one
  48. rights
  49. we
  50. if

Following should be added:
The data gathered is based on the Google page-count for each word. The page-count does not give higher rating to multiple occurrences of a word within the page. Which means that a single “copyright” within a text would count just the same as 100 “the"-words. (On the other hand, it’s quite natural that if on average “the” occurs 100 times, it will also occur at least 1 time in most shorter pages, which would increase its page-count.)


You can also download the word-hits list as ASCII text file (CSV file, to import into Excel, a database, or the like):


chriSEO Website

chriSEO looks like a great site. It mostly covers SEO and Google. Only thing is, I can’t find an RSS/XML feed.

Google Rebuffs Microsoft

“Internet search leader Google has rejected a takeover bid from Microsoft in favour of selling its shares directly to the public, The New York Times has reported.”
--DPA, Google rebuffs Microsoft, November 3, 2003


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!