i'd like to learn how to write a web query that allows you to count how many times a word is used on a website. as an example a query that counts how many times the word "recession" or "health care" shows up each day a website (say news.yahoo.com).
eventually i'd like to be able to query more than just one news site, say perhaps the top 10 in the usa, or top 25 in the world.
any help/hints or references to other learning tools is much appreciated.
*top news sites:
1 news.yahoo.com 2 www.cnn.com i www.news.bbc.co.uk 3 www.nytimes.com 4 www.msnbc.msn.com 5 news.google.com 6 www.reuters.com 7 www.foxnews.com i www.guardian.co.uk 8 news.aol.com i www.timesonline.co.uk i www.news.com.au 9 www.usatoday.com 10 www.abcnews.com 11 www.topix.com 12 www.iht.com 13 www.voanews.com 14 www.cbsnews.com i news.sky.com 15 www.csmonitor.com
* according to alexa (the web
information company) on 25aug08 http://www.alexa.com/browse/general/
?&CategoryID=8&mode=general&R=False&
Start=1&ListingCount=True&SortBy=Pop
ularity
|
You can use a query like [site:news.yahoo.com yourword] (replace "yourword", and don't use the square brackets)... however, this will only give you an approximation and it also only counts 1 if the word appears on a page, no matter how often it appears. |