Google Blogoscoped

Wednesday, April 9, 2008

Google REST Search API

Google half-way cancelled their SOAP API a while ago, but they now* offer a parametrized URL that returns a JSON data set. Google says this REST approach is useful for "Flash developers, and those developers that have a need to access the AJAX Search API from other Non-Javascript environments." This may be even simpler to use than the SOAP API, though I wonder how long (and how well) it's going to be working. Here's an example query:

ajax.googleapis.com/ajax/ services/search/web?v=1.0&q=hello%20world

This URL format can also be adjusted to grab results from video search, book search and so on.

While the URL has the word AJAX in the string and this is officially part of the Google AJAX Search API, this has nothing to do with AJAX per se, as the URL can be called from other environments, including the server side. All you need is a JSON library to parse the results (JSON means JavaScript Object Notation, though it also doesn't require JavaScript). The Yahoo Search API already utilizes a similar approach, though it can return XML as well.

Eugenius, who noted this in the forum, says "I'm wondering if Google opened up this channel so that App Engine developers would have Python access to it's search, translation, and feed-cache products?"

On the other hand, it's also fairly easy to just screenscrape Google results, though it may be against your netiquette as it may require ignoring the robots.txt file by Google (Google's robots guidelines disallow direct bot-spidering of their search results). The bonus is that it can work on any kind of Google result as well as any website, whether the site provides an API or not. Here's a PHP5 sample that grabs the Google top 10 results, for instance – to PHP it doesn't matter whether the Google result page is valid HTML (let alone valid XHTML), it just parses pretty much anything into a neat object model that is searchable via XPath:

$url = 'http://www.google.com/search?hl=en&q=';
$nodes = getHtmlNodes($url . 'hello+world',
        '//h2/a[@class="l"]');
foreach ($nodes as $node) {
    $url = $node->getAttribute('href');
    echo htmlEntities($url) . '<br />';
}

function getHtmlNodes($url, $xpath) {
    $dom = new domdocument;
    @$dom->loadHtmlFile($url);
    $oXpath = new domXpath($dom);
    return $oXpath->query($xpath);
}

[Thanks Eugenius!]

*I'm not sure how recent this is.

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!