Google Blogoscoped

Thursday, October 5, 2006

Google Code Search Live

Google released Code Search, a search engine dedicated to finding pieces of public source code. There are existing code search engines, but this one seems to top them in terms of scope; not only does Google snoop around in ZIP files of different sorts, they also go check different CVS repositories.

For example, I took a random string from one of my older programs on A search for “With frmPad.txtPad” returns zero results in both and Google Code Search however not only returns all instances of this piece on, they also show all other files of this package in an easy to navigate list to the left side*. And in the content area, you’ll be able to see the full source code, formatted by Google to highlight search terms.**

Search syntax

Now, Google Code search has syntax unlike other Google searches. Normally Google acts fuzzy in that it ignores special characters. Google Code search is more precise (and naturally, it has a target group of people who are used to handling advanced precise syntax). A search for “hello, world” will return only matches with this exact string (including the comma and space). In fact, you have the full power of searching using regular expressions; search for ^hello and you’ll only get instances of “hello” that appear at the beginning of a line.

For starters, you can decide to search for swear words... which frustrated programmers tend to leave in the comments of their source code. There’s “Sucks” with 166,000 matches (like “DOS I/O *still* sucks”) or “I hate” with 13,400 results (including “Have I mentioned how much I hate DTDs?” or “i hate using globals!”). Search for the F-word, and you’ll get a couple of down-to-earth comments (like “as usual, IE is fucked up”).

Another syntax specific to Google code search is the “lang” operator. Search for “lang:c” and you’ll only get snippets written in C. (This one seems to miss a couple of languages unfortunately.)

The Google Code Search API

Additional to the regular web interface, Google Code search also comes with a REST API (simple parametrized GET queries returning XMLs). This way you can build code search functionality into existing programs (like IDE’s). Here’s the result XML for the sample query “hello world”.

Still in Beta...

This service is a “Labs” product for now and as it’s quite new, we can expect some early quirks. For example, I wasn’t able to go through all (or even half) of the indicated result pages for some search queries. Also, it would have been nice to use some of Google’s common search syntax, like the site: operator to restrict a search to a single domain only. And why not return the syntax definition from when I search for e.g. “substr lang:php”? People may also be able to find exploits in software using Google code search, or just do some regular crawling for email addresses (as opposed to Google web search, looking for “@” actually returns lots of emails). Other than that, it looks like Google already beat the competition with this one.

*And if you can’t find your own source code in Code search, you might want to give Google’s code submission form a try.

**Very often, source code that is publicly available also includes a license allowing full republication. Google also looks at the license of a program but even when they don’t find one, they’ll republish the full source on their servers. It would be nice if Google would include a link not only to the original source file as they do now, but also the page where the file was linked to, further giving credits to the original authors (not unlike what they do with image search).

[Thanks Garett Rogers and others commenting in the previous thread!]


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!