Google Blogoscoped

Forum

Creating a Googleshare Map With Google Spreadsheets  (View post)

Bilal [PersonRank 10]

Friday, March 21, 2008
12 years ago5,479 views

Very impressive,

Marek Prokop [PersonRank 1]

12 years ago #

Great idea, Philipp. The 50 import functions limit can be overcome by the copy and paste *values* only commands.

Veky [PersonRank 10]

12 years ago #

> "A2" refers to the country name cell, and the "$" will make sure the variable is changed to the other respective cells when you copy it later.

On the contrary. $ is absoluteness marker, saying that A (the column) is absolute, and _won't_ be changed when you copy the cell. 2 (the row) doesn't have $ preceeding it, so it _will_ be changed. See docs.paperless-school.com/Abso ... for example.

> *Lucky for us Google ignores their own robots.txt when polling this page.

Google always ignores robots.txt when the fetch is the immediate result of an explicit user action. See scholar.google.com/feedfetcher ... for example.

Philipp Lenssen [PersonRank 10]

12 years ago #

> when the fetch is the immediate result of an explicit user action.

But Google automatically updates the data (at certain automated intervals), even when you never touch the spreadsheet again but just include it somewhere, thus it is not always an immediate result of the user action.
I think this is what makes these tools really useful, though, whether or not it's conforming to the robots.txt – IIRC, Yahoo Pipes doesn't allow this (I might be wrong, but I remember I didn't manage to grab the Google result with Pipes).

> On the contrary. $ is absoluteness marker

Thanks a lot Veky, I corrected this.

Veky [PersonRank 10]

12 years ago #

> But Google automatically updates the data (at certain automated intervals), even when you never touch the spreadsheet again but just include it somewhere,

Are you sure about this? It shouldn't be so hard to check, if you're willing... make the page A, make a spreadsheet B that includes data from page A, and include B into page C. Host both pages A and C on your server, and don't link to them from anywhere (except one link from B to A, in the form of =includeXML). Don't load any pages (A, B, or C) into a browser for a fixed period of time (say, one day). After that period has passed, see in your server logs if you have any requests for page A during that period. That should give the final answer, I suppose. I'd do it myself, but I don't have a server. :-/

Philipp Lenssen [PersonRank 10]

12 years ago #

I am not sure if they update the data if the file is never requested, all I'm saying is it will automatically update even if you, the user, never touch it again but just upload it somewhere... you could for instance add a 1-pixel iframe pointing to the screenshot on your blog, or subscribe to the spreadsheet's feed in Google Reader, which would cause automated polling to the feed. Also, what you called the user action is actually (also) a developer programming something – even when it's merely a single line when writing the "importXml" function. The developer is in that case using a library not checking the robots.txt (which again, I think is useful). Perhaps the individual developer has the responsibility to check or not check robots.txt in that case, depending on how one interprets the meaning of robots.txt :)

To test if Google also updates the data if you don't check the file, you could create a robots.txt-disallowed folder on your server and in it, drop a script that will log to the database whenever someone accesses the script... and then point the importXml towards that folder. My guess would be Google does not automatically poll the folder if no one is requesting the file but who knows...

Veky [PersonRank 10]

12 years ago #

I meant "user" in a broader sense. Not only the original creator, but anyone requesting page C is also an "user request". It's practically the same as putting image A into webpage B that is iframed into a page C. Even if A is in a directory that is disallowed by robots.txt, it should still be downloaded when someone requests C.

Philipp Lenssen [PersonRank 10]

12 years ago #

I think this goes to show it's not a clear-cut case. Imagine I create a site called TogetherWeCrawlEverything.com. Whenever a user visits the site, it will trigger – based on that user request – its search bot to index a bunch of robots.txt-disallowed files form the web. Is this against netiquette or not?

(Perhaps it starts with the definition of "what's a bot". E.g. is the Firefox extension Firebug a bot? Is Wget a bot? Is Yahoo Pipes a set of programmable bots? Is an updating, programmed spreadsheet a kind of mini bot? Is robots.txt *only* meant for search engines, as opposed to what e.g. Wikipedia writes right now?)

On the issue of whether Google Spreadsheets update if no user requests them, I've just set up a little experiment, for something else that's hopefully actually useful but it should also shed more light on that issue.

Joining Dots [PersonRank 0]

12 years ago #

Thanks so much for this post, great fun to play with the googleshare. But I'm noticing that the page count value is different in Google docs than in the web browser. For example, a search on Microsoft returns 758 million results in the browser but only 662 million results in Google Docs. Any reason why there is a difference?

Philipp Lenssen [PersonRank 10]

12 years ago #

Joining Dots, sometimes you even get differences on different locations or times when searching in the browser, due to hitting different Google data centers... that might be the issue here too. Might also be related to specific settings, like SafeSearch...

nope [PersonRank 1]

12 years ago #

choropleth, not heat map.

Philipp Lenssen [PersonRank 10]

12 years ago #

If I understand this correctly than a choropleth is actually a form of heat map, as well as a form of thematic map, Nope. Wikipedia says "A heat map is a graphical representation of data where the values taken by a variable in a two-dimensional map are represented as colours." Wouldn't that fit even a two-dimensional world map? Google calls the gadget Heatmap, by the way. But thanks for the clarification!

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!