Google Blogoscoped

Monday, October 9, 2006

New: Cover Browser

During the last week I created Cover Browser, a website to explore & search comic book covers. You’ll find everything from Superman to Spider-Man to independent comics, and I plan to add more titles if feedback is good.

Technical background

The project uses PHP5 + MySQL and XHTML 1.0 Strict + CSS2 (with no JavaScript at this time), and Corel PhotoPaint and PaintShop Pro 4 for graphics stuff. To develop I’m running a local Windows XP installation based on WAMP, a nice package that installs PHP5, Apache and MySQL all in one. Some tweaks were necessary to replicate htaccess and SOAP functionality, and to be able to access http://coverbrowser/ on this machine instead of http://localhost/. Online, I’m using htaccess to prettify the URLs (more readable URLs instead of query strings).

To collect the cover images for the site – next to a couple of my own scans – I used the nice Yahoo Image REST API as well as some eBay screenscraping, searching for e.g. “Amazing Spider-Man 1”, “Amazing Spider-Man 2”, and so on (filtering some very small covers). The results were up to 20 images for each issue, after which I’m making a manual selection, followed by image conversion using the GD library that comes with PHP. I’m assuming that the reproduction of the covers is fair use, and each cover is also linked to the site where it originates from (which also reproduced it under fair use). Every cover is also linked to Google and eBay search results so you can get further info or buy the issue.

To add a search engine to the site, I implemented a little PHP5-based API hack that yields fuzzy but often good matches (e.g. you can search for “john byrne” to get covers which are approximately 80% pencilled by John Byrne, and you can search for superhero “Gambit” to see the X-Men cover with Gambit’s first appearance):

  1. For every issue title (e.g. “Incredible Hulk 102”), I grab the top 10 snippets via the Google SOAP Search API and add them all together.
  2. I’m adding some more snippets for an amended search which goes something like ["Incredible Hulk 102” (penciler | artist) -ebay -yahoo], trying to locate the artists behind the comic book.
  3. I feed this large snippets string into the Yahoo Term Extraction API (it’s kind of REST – parametrized XML – but uses a POST submit to handle large strings).
  4. Yahoo now returns a couple of terms, which I can filter some more using a blacklist.
  5. Using the Google API again I grab the Google page count for each of Yahoo’s terms in combination with the issue title + number and save it to the MySQL database. (This gives me an approximation of how commonly the two are mentioned together, i.e. how good the quality of the term is for this comic book.)
  6. Now when you search for a given keyword, I simply return the covers with the associated terms that yield a match, ordered by the ones with the highest page count first, and not showing those which fall below a certain page count threshold.

[Thanks Alek Komarnitsky and Tony Ruscoe for alpha-testing the site!]

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!