Google Blogoscoped

Thursday, March 26, 2009

Covers Color Search

Inspired by Google’s recent release of a color matching search for Google Images (see previous post), I’ve added a color search to Cover Browser. Cover Browser has over 250,000 covers of all sorts and I figured this could be a nice alternative way to explore them.

How does the color matching at the site work? I didn’t want to just crawl the Google Images color search and add the data back to my own database (though that may be feasible for some projects too). So I started out with some suboptimal tries at calculating color differences, or near matches, to a self-made predefined palette which included some shades of blue, some of green, red, cyan and so on. I’ve also tried working with HSL (hue, saturation, lightness) values. But the results were very mixed, sometimes good, sometimes bad, varying across colors.

The algorithm I ended up using is simple but works well for the purpose (I tried some not so successful Google searches before, but if there’s an “official” way to do this stuff, I’d also like to hear it in the comments...):

  1. For every picture on the site, grab the RGB values for 10,000 random image pixels.
  2. For every pixel, divide red, green, and blue by 60 each (that number could be a little less or more, depending), and round the result. The effect is that you end up with a lot less “color buckets”. One color bucket may be rgb(3,0,0), which would translate to rgb(180,0,0), a shade of red. (Caveat: I didn’t collect gray buckets, as the results weren’t too interesting for those. There is a black and a white bucket though.)
  3. Keep a count of each color buckets “fill” status in percent, relative to all 10,000 color picks. Maybe the “purple” bucket (internal number 2-0-2) is filled 24%, and that would mean it’s a significant amount for that picture. (Not every image has a bucket, but some images may have multiple ones. I ended up with above 363,000 RGB values.)
  4. Save all RGB bucket values which are above a certain threshold (I went with 10%) in a database table, which looks like “id | coverId (foreign key) | red | green | blue | percent”.
  5. Now when you want to find the color match for e.g. 255,0,0, you divide by that 60 again, and perform a SQL query for that color, ordered by percent (and as a fallback if there’s no exact matches for that color bucket, you can check for matches above those RGB values and order by the sum of red, green and blue, ascending).

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!