If you’ve been to Yahoo lately, you can see a redesign took place on the front-page. Still somewhat messy, but certainly a fresh look.
Here is an example of a screen-scraping meta search engine written in PHP5. It uses new PHP5 features like easy HTML-to-XML conversion and Xpath. The array declared on top of the "performSearch" function is crucial to how the search will be applied: it contains the title of the search engine to be screenscraped; its URL, with placeholder [keyword]; and the Xpath intended to grab the first relevant link.
<?
header("Content-type: text/html; charset=utf-8");
$q = ( isset($_GET['q']) ) ? $_GET['q'] : '';
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Streaming Screen-scraped Search Results</title>
<link rel="stylesheet" href="default.css" type="text/css" media="screen, projection, print" />
</head>
<body>
<h1>Streaming Screen-scraped Search Results</h1>
<form action="./" method="get"><div>
<input type="text" size="20" name="q" value="<?= toAttribute($q) ?>" /> <input type="submit" value="Search" />
</div></form>
<?
if ($q != '')
{
performSearch($q);
}
?>
</body>
</html><?
function performSearch($keyword)
{
$arrTitleUrlXpath = array(
'Google;http://www.google.com/search?q=[keyword];//p[@class="g"]//a[@href]',
'Yahoo;http://search.yahoo.com/search?p=[keyword];//ol[@start="1"]//a[@href]',
'MSN;http://search.msn.com/results.aspx?q=[keyword];//li/a[@class="t"]'
);
$dom = new domdocument;
foreach ($arrTitleUrlXpath as $titleUrlXpath)
{
list($title, $url, $sXpath) = explode(';', $titleUrlXpath);
$url = str_replace( '[keyword]', urlencode($keyword), $url );
echo '<div class="loading">Loading <a href="' . $url . '">' . $title . '</a> result...</div>';
flush(); ob_flush();
@$dom->loadHTMLFile($url);
$xpath = new domxpath($dom);
$xNodes = $xpath->query($sXpath);
foreach ($xNodes as $xNode)
{
$sLinktext = @$xNode->firstChild->data;
$sLinkurl = $xNode->getAttribute('href');
if ($sLinktext != '' && $sLinkurl != '')
{
echo '<div class="result"><a href="' . $sLinkurl . '">' . $sLinktext . '</a> ';
echo '(Top result from ' . $title . ')</div>';
echo "rn";
flush(); ob_flush();
}
break;
}
}
}
function toAttribute($s)
{
$s = toXml($s);
$s = str_replace('"', '"', $s);
return $s;
}
function toXml($s)
{
$s = str_replace('&', '&', $s);
$s = str_replace('<', '<', $s);
$s = str_replace('>', '>', $s);
return $s;
}
?>
This is the stylesheet. To display a "loading" message for the second the result is grabbed from another site, a div-block of class "loading" is used. It will then be overlayed with a div-block of class "result" using relative positioning.
body
{
background-color: white;
color: black;
margin: 20px;
padding: 0;
font-family: arial, helvetica, sans-serif;
}
a
{
color: blue;
}
.loading
{
font-style: italic;
background-color: #eee;
height: 50px;
width: 700px;
padding: 5px;
overflow: hidden;
margin-bottom: 0;
}
.result
{
background-color: #fff;
height: 50px;
width: 700px;
padding: 5px;
overflow: hidden;
position: relative;
top: -60px;
}
You can also see above running on my site.
>> More posts
Advertisement
This site unofficially covers Google™ and more with some rights reserved. Join our forum!