Hi Matt. Can you tell us a little bit about yourself – what did you do before Google, and what are you focusing on at Google?
Before Google, I was working on my Ph.D. at the University of North Carolina at Chapel Hill. I never finished the Ph.D., but luckily I got a masters degree along the way as a backup. :) I’m in the quality group at Google, and I usually work on webspam and webmaster issues.
Since when are you working with Google, and in which projects which have been launched since then were you involved?
I spent a year in the ads engineering group. That was a lot of fun, including a chance to implement the first test of the AdWords user interface. I worked on SafeSearch back in 2000, but most of my time has been in the quality group.
Could you outline some of the approaches implemented in Google’s SafeSearch?
There are different versions for different languages. I worked on the initial English version. It tries to do some smart things looking at words in the document. The main thing you need to know is the philosophy of SafeSearch. When we looked at competing family searches in 2000, they were all whitelists. I think AltaVista returned 24 results for the query [sex]. That seemed too low, especially when Google had a pretty good copy of the web to work with. So we designed it so that it was a search over the entire web, and then we made sure to veer much more toward the recall side of spotting porn documents than the precision side. SafeSearch doesn’t affect websearch by default, so people who opt-in to filter their search really don’t want to see porn documents. That’s why SafeSearch leans more toward the recall side of spotting porn.
Is it true you have a special interest in the world of insects? How come?
Sharp eyes! www.cs.unc.edu/~cutts/ is my grad school home page. I do like spiders, but those links helped test whether the nofollow attribute worked correctly. Those are the very first nofollow links in the world--maybe someday they’ll be in the Smithsonian. ;)
For those interested in search engine news, your blog has become a great resource. To my knowledge, it’s the first in which someone from Google actually talks about Google in-depth. How has the blog worked for you so far?
It’s been wild, and I’ve enjoyed it. It’s a really good way to get the word out about some things and answer questions. I didn’t expect it would attract as many comments on the posts though. I wish I had more time to answer every single question, but I have to eke out posts at night and in my spare time already. Overall, it’s good that it provides another channel of communication with webmasters, and I have a better feel for webmasters’ questions and priorities.
So is your Google gadgets blog the “20%” Google gives you for your own projects?
I haven’t done a 20% project since... hmm. That phonetic “soundslike" search demo was a few years ago. I haven’t been using my 20% for the last couple years, but my regular job is pretty interesting, so that’s fine. The blog has mostly happened in my not-so-copious free time: nights and weekends. I’ve been getting less sleep the last 2-3 months. :)
What rules do you have to follow for posting on your blog? Do you need to have someone review your posts before go-live?
No. As a courtesy, I told the public relations group at Google when I was starting my blog, but no one reviews my posts before they go live. Google has a blogging policy, but it’s pretty much common-sense: don’t post confidential stuff, material financial info, etc. I wanted to do the blog on my own domain so I would have a little more independence to say “this annoys me about Google” or to make stupid jokes without worrying about it.
Daniel Brandt of Google-Watch.org claims you worked for the NSA before, and that you have “a top-secret clearance.” Any comments on him and his statements?
I have mixed feelings about Daniel Brandt. I disagree with a great deal of what he says and how he says it. For example, his theory that Google’s index size was limited by 4 byte docids was just 100% wrong. I also think the little graphics on google-watch.org do more harm than good to the site’s credibility. This one is my favorite (Google as puppeteer pulling the strings on a reporter).
But you’re probably more curious about whether I, Matt Cutts, am a secret connection between Google and the Military-Industrial Complex, the Illuminati, or any other shadow government and their black helicopters. And the answer is no. :) The University of Kentucky offers college students a co-operative education program where you alternate between working for a semester and studying for a semester. Students can work at places from Kodak to Lexmark to NASA to the Department of Defense. I decided to co-op at the DoD because I wanted to get some real-world experience and I wanted a job at an interesting place. For a young college student, co-operative education shows you what to expect after you graduate, and it was useful for me to see the good (and the bad) of a real workplace.
I doubt that I left much of a lasting impact on the DoD, and in turn I don’t believe that they implanted me with any mind-control chips that would allow them near Google’s servers. :) I believe whatever security clearance I had lapsed years ago. Thanks for asking and giving me the chance to clear that up. Daniel Brandt, if you happen to read this, would you mind updating your “Spooks on board at Google” page to note that I do not have a security clearance? Thanks!
Do you separate different categories of webspam?
Sure, usually the technique is the category: hidden text, cloaking, doorway pages, etc. Some categories show a lot more effort and intent and those are usually more serious.
I suppose when Google does testing of different fine-tunings of its organic results, similar to a spam filter, it’s often a balance between letting too much spam in the top results, or throwing out too many relevant sites. Is that true? What are your thoughts on this?
Sure, I could stop all the spam in the world if I didn’t have to return any search results. :) Usually we try for changes that are across the board wins in terms of relevance/topicality and spam. There’s a large class of good sites that are pretty easy to recognize as good sites. But it’s true that there can be a balance between topical sites that are spammy and high-quality sites that are less on-topic.When you go eat lunch, do you all talk about Google-related stuff?
It’s pretty common to talk geek stuff (TiVo, Netflix, what people are talking about on blogs, or the search industry in general). But it isn’t at all unusual to end up talking about Google or a specific type of spam.
Do you think there will still be Google in 100 years from now?
I hope so, or at least something like Google. My grad school advisor used to joke about Digikey, a company that lets people buy electronic parts by mail. He would say: if Digikey didn’t exist, someone would have to invent it. Likewise, I think if Google didn’t exist, someone would have to invent it. In 100 years, there had better be a company that does its best to get the highest-quality information to users as quickly as possible. And it should have holiday logos. If such a company doesn’t exist in 2105, someone will just have to invent it. :)
Have you been involved in making the “rel=nofollow” initiative come true? Were there any surprise for you after it was revealed?
I think the ideas behind nofollow were popping up in several places, but I did work with folks at Google, Yahoo, MSN, plus talking to lots of blog software makers.
The biggest surprise to me about nofollow is how many people seemed to think that it was only for blog comment spam. It’s actually a very general mechanism that can be used in lots of places: guestbooks, referrer lists--basically anywhere that you can’t or don’t want to vouch for the quality of a link. If you go back and read the post from January at googleblog.blogspot.com you’ll see that we anticipated lots of reasons that people might want to use nofollow:
“Q: Is this a blog-only change?
A: No. We think any piece of software that allows others to add links to an author’s site (including guestbooks, visitor stats, or referrer lists) can use this attribute. We’re working primarily with blog software makers for now because blogs are such a common target.”
As for the value itself, “nofollow”, my first thought was “you shouldn’t have commands in HTML, only verb-(and adjective)-less descriptions of relationships and structure”. The thinking behind that is always to allow for more flexibility and more precise tools to make meaning of HTML in many different ways (like PageRank), because you can apply a range of different verbs/ adjectives later on. I talked to the W3C HTML chair Steve Pembleton about this once and we basically agreed on this. What are your thoughts on this?
The name nofollow had a lot to do with the nofollow meta tag, which operates on a page level instead of a link level. In practice, other words like vote-abstain or (insert your favorite word here) might have been better; however, most computer science-y people are pretty good at abstracting away what something does and just making a mental note of the implications of a given operator.
In more general terms, what do you think is the relationship between Google and the W3C? Do you think it would be important for Google to e.g. be concerned about valid HTML?
I like the W3C a lot; if they didn’t exist, someone would have to invent them. :) People sometimes ask whether Google should boost (or penalize) for valid (or invalid) HTML. There are plenty of clean, perfectly validating sites, but also lots of good information on sloppy, hand-coded pages that don’t validate. Google’s home page doesn’t validate and that’s mostly by design to save precious bytes. Will the world end because Google doesn’t put quotes around color attributes? No, and it makes the page load faster. :) Eric Brewer wrote a page while at Inktomi that claimed 40% of HTML pages had syntax errors. We can’t throw out 40% of the web on the principle that sites should validate; we have to take the web as it is and try to make it useful to searchers, so Google’s index parsing is pretty forgiving.
What was the last good product coming out of the Yahoo labs when you thought, “Wow, that’s smart"?
I like a lot of Yahoo’s stuff. I really enjoyed their Web 2.0 product with saving and blocking results. Site Explorer is nice. And I’m glad that they did their Creative Commons search.
How well do you think Google competes with Yahoo in terms of search result quality? And how well with MSN Search?
I think we do quite well, but we’re more focused on our own metrics and how to improve quality than focusing on competitors.
Do you edit Wikipedia articles from time to time?
I have not edited articles at Wikipedia yet. Sometimes I think it would be fun to edit a section on SEO, but I’m sure it would turn into a whitehat/blackhat slagfest pretty quickly. In general, I think the Wikipedia is great.
In 5 words or less, what are your thoughts on ... Google Reader?
For now, I prefer Bloglines.
... Mark Jen?
Avoid disclosing confidential information.
... Googleplex food?
Burritos. Bacon Polenta. Steak. Yum.
... Google Base?Too early to tell.
... spam blogs?Grrrr. Definitely not recommended.
Purple mojo at the Y-plex.
Gotta respect 30 years’ success.
Did you see some unlaunched products inside Google where you were thinking “Once that’s going to be released, it will blow away competition"?
When I saw Google maps, I was blown away. I thought “This is just the way I’ve wanted a map to work for years, and I never realized it.”
In general, do you get to see other products/ services you’re not working on specifically?
Things are often launched internally for Googlers to play with before it goes live. That’s one of the major perks of being at Google. :)
If you have to take a guess, what’s the next big thing on the web?
Ajax is certainly an interesting technology. But I think the biggest trend will continue to be more services and information becoming available online. For example, few years ago, it would have been very weird to order a pizza online. Now, in many places that wouldn’t be that unusual. More and more people will turn to the web for maps, tracking packages, checking the weather, and all kinds of other information.
>> More posts