
       

 
     
       
If you’ve been to Yahoo lately, you can see a redesign took place on the front-page. Still somewhat messy, but certainly a fresh look.
       
Here is an example of a screen-scraping meta search engine written in PHP5. It uses new PHP5 features like easy HTML-to-XML conversion and Xpath. The array declared on top of the "performSearch" function is crucial to how the search will be applied: it contains the title of the search engine to be screenscraped; its URL, with placeholder [keyword]; and the Xpath intended to grab the first relevant link.
<?
header("Content-type: text/html; charset=utf-8");
$q = ( isset($_GET['q']) ) ? $_GET['q'] : '';
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Streaming Screen-scraped Search Results</title>
    <link rel="stylesheet" href="default.css" type="text/css" media="screen, projection, print" />
</head>
<body>
<h1>Streaming Screen-scraped Search Results</h1>
<form action="./" method="get"><div>
<input type="text" size="20" name="q" value="<?= toAttribute($q) ?>" /> <input type="submit" value="Search" />
</div></form>
<?
if ($q != '')
{
    performSearch($q);
}
?>
</body>
</html><?
function performSearch($keyword)
{
    $arrTitleUrlXpath = array(
            'Google;http://www.google.com/search?q=[keyword];//p[@class="g"]//a[@href]',
            'Yahoo;http://search.yahoo.com/search?p=[keyword];//ol[@start="1"]//a[@href]',
            'MSN;http://search.msn.com/results.aspx?q=[keyword];//li/a[@class="t"]'
            );
    
    $dom = new domdocument;
    
    foreach ($arrTitleUrlXpath as $titleUrlXpath)
    {
        list($title, $url, $sXpath) = explode(';', $titleUrlXpath);
        $url = str_replace( '[keyword]', urlencode($keyword), $url );
        echo '<div class="loading">Loading <a href="' . $url . '">' . $title . '</a> result...</div>';
        flush(); ob_flush();
    
        @$dom->loadHTMLFile($url);
        
        $xpath = new domxpath($dom);
        $xNodes = $xpath->query($sXpath);
    
        foreach ($xNodes as $xNode)
        {
            $sLinktext = @$xNode->firstChild->data;
            $sLinkurl = $xNode->getAttribute('href');
            if ($sLinktext != '' && $sLinkurl != '')
            {
                echo '<div class="result"><a href="' . $sLinkurl . '">' . $sLinktext . '</a> ';
                echo '(Top result from ' . $title . ')</div>';
                echo "rn";
                flush(); ob_flush();
            }
            break;
        }
    }
}
function toAttribute($s)
{
    $s = toXml($s);
    $s = str_replace('"', '"', $s);
    return $s;
}
function toXml($s)
{
    $s = str_replace('&', '&', $s);
    $s = str_replace('<', '<', $s);
    $s = str_replace('>', '>', $s);
    return $s;
}
?>This is the stylesheet. To display a "loading" message for the second the result is grabbed from another site, a div-block of class "loading" is used. It will then be overlayed with a div-block of class "result" using relative positioning.
body
{
    background-color: white;
    color: black;
    margin: 20px;
    padding: 0;
    font-family: arial, helvetica, sans-serif;
}
a
{
    color: blue;
}
.loading
{
    font-style: italic;
    background-color: #eee;
    height: 50px;
    width: 700px;
    padding: 5px;
    overflow: hidden;
    margin-bottom: 0;
}
.result
{
    background-color: #fff;
    height: 50px;
    width: 700px;
    padding: 5px;
    overflow: hidden;
    position: relative;
    top: -60px;
}You can also see above running on my site.
>> More posts
Advertisement
This site unofficially covers Google™ and more with some rights reserved. Join our forum!