Google Blogoscoped

Sunday, April 10, 2005

Yahoo Term Extraction in PHP5

If you want to use the Yahoo API's Term Extraction/ Content Analysis service in PHP5, you should implement a POST request for longer texts. Here's a handy function, getSignificantTermsArray, which accepts a text and returns an array of terms (note you should replace YOUR_APP_ID with your Yahoo API developer key):


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Terms Extraction PHP Sample</title>
</head>
<body>

<?

// The sample text could be much longer
$s = 'Alice was wondering if the White Rabbit was up to something.';

$arrTerms = getSignificantTermsArray($s);

foreach ($arrTerms as $term) {
    echo $term . '<br />';
}

?>
</body>
</html>
<?

function getSignificantTermsArray($s)
{
    // cURL must be installed for this to work
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, 'http://api.search.yahoo.com/ContentAnalysisService/V1/termExtraction');
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    curl_setopt( $ch, CURLOPT_POSTFIELDS, 'appid=YOUR_APP_ID&context=' . urlencode($s) );
    $xml = curl_exec($ch);
    curl_close($ch);

    // A workaround deletes the Schema declarations, as they
    // confuse PHP5
    $xml = str_replace('xsi:schemaLocation="urn:yahoo:srch http://api.search.yahoo.com/ContentAnalysisService/V1/TermExtractionResponse.xsd"', ' ', $xml);
    $xml = str_replace('xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:yahoo:srch"', ' ', $xml);

    $arrTerms = array();

    // The nice native PHP XML functions
    $dom = new domdocument;
    $dom->loadXml($xml);
    $xpath = new domxpath($dom);
    $xNodes = $xpath->query('//Result');
    $i = 0;
    foreach ($xNodes as $xNode) {
        $arrTerms[$i++] = $xNode->firstChild->data;
    }

    return $arrTerms;
}

?>

If you don't have the necessary environment to run above PHP snippet, you can still extract terms manually. To do so, simply create a new HTML file on your desktop and copy & paste the following into it:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Terms Extraction Form</title>
</head>
<body>

<h1>Term Extraction</h1>

<form action="http://api.search.yahoo.com/ContentAnalysisService/V1/termExtraction" method="post"><div>
    <textarea name="context" rows="40" cols="70"></textarea>
    <input type="hidden" name="appid" value="YOUR_APP_ID" />
    <input type="submit" value="Submit" />
</div>
</form>

</body>
</html>

This results in the following simple text-box, which you can also fill and submit here:


And if you wonder what you can do with term extraction, take a look at the Auto-Illustrator and the Auto-Linker applications.

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!