Google Blogoscoped

Wednesday, February 25, 2004

Froogle Wireless, and Wireless Googling

Froogle Wireless is a new creation from the Google Labs. It promises to “Search the web for products with Froogle wireless search and your wml-enabled cell phone”. Did anyone tell Google that WML is dead and modern WAP2 cell phones have no problems with (X)HTML?

By the way: a great browser for the Symbian OS – like for the Nokia 6600 – is NetFront Access. It can make sense of pretty much anything online (including JavaScripted pages) and works much better than say the Opera handheld browser. There are many configurable short-cuts, optional full-screen rendering, user stylesheets, and other nice features. And of course it displays Google well.

Google Germany Understands Multipart Words

In my native language German, you can create new words by combining other words to your liking. For example “Auto” (car) “Ersatz” (“Replacement”) plus “Teile” (Parts) can be combined to the word Autoersatzteile (which means “car spare parts”). Now in German-language Google.de, a combined word like this would also be understood in its parts. You can see this by watching which words are presented as bold in the result list for “Autoersatzteile”, and that’s also “Auto-Ersatzteile”. This seems to be a new feature. [Via Klaus Schallhorn.]

Orkut Security Problem

An Orkut user by the name of Tantek has exposed a flaw in Orkut’s security system.

Orkut is the Microsoft ASP.NET and Google-powered Web community. Every user can define a variety of details (like hometown, sexual orientation, activities) and additionally set those details to be viewed by friends only. And you choose who your friends are... well, unless someone is tricking the system.

A simple inline-frame, hidden in the browser by absolute-positioning it with negative values, can trigger the “add as friend” or “join community” command. An anonymous poster in Orkut writes:

“This is a clear example of why it would be very silly to trust Orkut’s permissions system for sharing your information with only your trusted friends.

Web developers who don’t even understand basic cross site scripting precautions shouldn’t be trusted with more than the cookies they give us. Surely most google coders have a little more sense than the ones that wrote this particular app.”

The page in question with the possible Orkut exploit can be found at [and I suggest to not open this if you logged in to Orkut with this browser] <http://tantek.com/log/2004/02.html>. After I went there to try it out I automatically became part of the “Training Program” (in other words I was joined to a community with no doing of my own other than going to Tantek’s webpage). Tantek writes:

“This community is another training program designed to teach you one thing.

When you remain logged into Orkut and browse the web, any page you access can automatically change your Orkut membership without you knowing it.

This is due to the fact that Orkut uses HTTP GET URLs to alter your state.

The W3C long ago recognized this general vulnerability.

http://www.w3.org/2001/tag/doc/whenToUseGet.html
– Tantek in Orkut

Tantek further urges webmasters to spread the word about this Orkut vulnerability by pasting the following code on their webpages:

<iframe style="width:1px;height:1px;position:absolute;top:-31337px;left:-31337px" src="http://www.orkut.com/Community.aspx?cmm=19657&amp;cmd=add"></iframe>

GQL Today and Tomorrow

I call it the Google Query Language, GQL – advanced methods to treat the Web as giant database and Google as its SQL.
Here are some examples. Consider the following query:

“name is * * I (am | m) (a | an) *”

It doesn’t look like a basic Google search but it is, and can be entered into Google, or submitted as parameter via the Google Web API. What does it do? It looks for patterns like the following:

“My name is John Doe and I’m an artist”

The sequence is fixed but allows alternatives: if you use an asterisk, any word is allowed (“Name is *” would allow “Name is John” and “Name is Mary”). And when you use round brackets to enclose two words (separated by “|”), you allow a limited defined amount of words. For example in above query we allow “I am an”, “I’m a”, and so on.

The trick with queries like these is to stay below 10 words – and even less if you want to allow additional user input. However the asterisk wildchard does not count against the limit and should therefore be used in-between words to keep the amount as low as possible.

So let’s count above query; what he have is...

“name [1] is [2] * * I [3] (am [4] | m [5]) (a [6] | an [7]) *”

...seven words that count against the Google query limit. So we allow for 3 more words to add.
We can try to add a variable word, e.g. “hard-working male”, or “artist” to then grab what’s filled in the asterisk in the Google snippet. So if we query:

“name is * * I (am | m) (a | an) artist”

... we could grab “John Doe” (if that would appear in the Google result). So this way we could write a search engine that lists all artists online. Note that this is a simplified example. GQL is a soft science and results are not as clear-defined as in a well laid-out database. However it’s also infinitely more pragmatic and might work much more reliable much more faster than any Semantic Web/ RDF/ XML approach

Here is another example. How do we grab things, like a list of films, via GQL? We can use the following:

(The upper-case word can be written in lower-case and should just emphasize the variable.) You get the picture, what we will get from the Google result snippet would be the defined asterisk. It would make for a list of films. We could even go as far as to analyze all results, fill our database, and thus get the most beloved movies/ books/ actors/ comics ... of all time.

Another example: we want to find couples. Things that by their nature belong together, or are somehow connected. One GQL variant of this would be:

“i like * and *” -"and i”

If you find other successfull GQL samples, let me know! One way to add some GQL spice is to use the synonym operator (a tilde “~”) preceding words.

Google’s index is still growing. For GQL to perform better, it should. The more “natural-language babbling"* we have the more GQL works to our success.

*I call it babbling because the approach allows for nonsensical/ wrong statements being made, as long as the overall amount of statements is large enough to let the right ones shine through. For one person saying “Citizen Kane stinks” we have 10,000 saying “Citizen Kane is great”. For one person saying “1 + 1 = 3”, we have 10,000 saying “1 + 1 = 2” (if the “Web database” is big enough, that is).
Also, “babbling” because the more mundane the statement seems the more precious it is as information atom – e.g. who would ever write in his blog “a bicycle has two wheels"? And yet it is exactly those sentences which will be written when a lot is written, by a lot of people, and every day. Those statements help a machine to construct more complex knowledge. Even to discover its own theories and make statements outside of what is known online.

FindForward.com is showing some of the above, but often, both the amount of data as well as the 10-word limit of Google make it hard. We can look for well-know things, but there are not enough people so that every imaginable topic is covered. I believe if the Web grows the factor of about 1000 its current size, and if Google can index it all, we can see a lot of interesting tools to discover “world consciousness” – or to put it in simpler form, search engines will be able to answer search requests. In several years, a search engine will then not be listing result pages – it will talk to the searcher:

“You are interested in this movie? Well, so are other people, like John, who has a personal fan site on it, which a lot of people link to. The official homepage is here, and contains trailers. A nice, though somewhat longwinded review can be found at moviereviews.com. The movie’s director is ... and you can reach him here. Or if you just want to buy the DVD, the cheapest place you can get it is for $12 at ...”

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!