Google Blogoscoped

Forum

Scroogled

George R [PersonRank 10]

Thursday, July 1, 2010
4 years ago4,820 views

"Google.com/ie" or "output=ie" used to produce a nice list output for google results. Each result was on one short line with details more hidden in its anchor title. This was a convenient format. It was concise. Each line was numbered. It avoided images and other cruft. If you were generating 100 results per page it was easy to view.

Now, when I use "output=ie", I no longer get the list style results page. I get the standard google results page. I would like to be able to get the list style output. Does anyone know how to do this?

Scroogle scrapes google results. It utilizes this format. In May google began redirecting the "google.com/ie" page. blogoscoped.com/forum/170792.h ... This change caused scroogle problems until they learned about the "output=ie" parameter.

Scroogle still seems to be functioning. Does scroogle use another technique? Will a gradual deployment by google eventually affect scroogle?

Roger Browne [PersonRank 10]

4 years ago #

Adding this parameter (including the ampersand) to the standard google.com search results still works for me:

&output=ie

George R [PersonRank 10]

4 years ago #

If I use a specific IP (e.g. 209.85.227.99), I can get the "ie" results page.
209.85.227.99/search?output=ie ...

If I use the "www.google.com" domain, I get the standard results page.
I can get either the "everything" results page or the old style results page. I am not sure to which IP or server that is connecting.
google.com/search?output=ie&am ...

George R [PersonRank 10]

4 years ago #

Here are links via the coral cache.

ip: 209.85.227.99.nyud.net/search? ...
domain: google.com.nyud.net/search?out ...

George R [PersonRank 10]

4 years ago #

It is now affecting scroogle. They have a notice.
scroogle.org/cgi-bin/nbbw.cgi

"July 1, 2010: Here we go again...

We regret to announce that our Google scraper may have to be permanently retired, thanks to a change at Google. It depends on whether Google is willing to restore the simple interface that we've been scraping since Scroogle started five years ago. Actually, we've been using that interface for scraping since Google-Watch.org began in 2002.

This interface (here's a sample from years ago) was remarkably stable all that time. During those eight years there were only about five changes that required some programming adjustments. Also, this interface was available at every Google data center in exactly the same form, which allowed us to use 700 IP addresses for Google.

That interface was at www.google.com/ie but on May 10, 2010 they took it down and inserted a redirect to /toolbar/ie8/sidebar.html. It used to have a search box, and the results it showed were generic during that entire time. It didn't show the snippets unless you moused-over the links it produced (they were there for our program, so that was okay), and it has never had any ads. Our impression was that these results were from Google's basic algorithms, and that extra features and ads were added on top of these generic results. Three years ago Google launched "Universal Search," which meant that they added results from other Google services on their pages. But this simple interface we were using was not affected at all.

It is not possible to continue Scroogle unless we have a simple interface that is stable. Google's main consumer-oriented interface that they want everyone to use is too complex, too bloated, and changes too frequently, to make our scraping operation possible.

After a lot of suggestions from Scroogle users, and a fair amount of publicity, we found a fix and Scroogle was back in 24 hours. This fix was to insert an extra parameter, &output=ie, into the search terms that were relayed to Google. The extra parameter recovered the same interface that we thought was gone forever.

Now it seems like it actually might be gone forever. Late on June 30, 2010, the results produced while using this parameter began to shift to the usual busy Google interface with ads and a left-margin sidebar. Scroogle users saw a Scroogle page that said, "Google returned no results for this search," when in fact Google returned results but our scraper was unable to deal with them. Over the next few days we will attempt to contact Google and determine whether the old interface is gone as a matter of policy at Google, or if they simply have it hidden somewhere and will tell us where it is so that we can continue to use it.

Thank you for your support during these past five years. Check back in a week or so; if we don't hear from Google by next week, I think we can all assume that Google would rather have no Scroogle, and no privacy for searchers."

— Daniel Brandt, Public Information Research, scroogle AT lavabit.com

Roger Browne [PersonRank 10]

4 years ago #

Daniel Brandt:
"if we don't hear from Google by next week, I think we can all assume that Google would rather have no Scroogle, and no privacy for searchers"

Yeah right.

Anytime a service depends on scraping someone else's webpage, there's a big risk of service disruption. I'd say Scroogle has had better luck than average, as there wouldn't be many pages as stable as the Google page they were scraping.

I can't help getting the impression that Daniel Brandt is weary, and is looking for excuses to close Scroogle.

It's not as if it would be difficult to scrape other Google results pages.

George R [PersonRank 10]

4 years ago #

!!! Scareware Problem!!!

Scroogle is now reporting another problem not related to google. They have detected what may be an attack on their servers. As of July 3, 2010 they have detected the attack from 20000 IPs, which they have blocked. Apparently this is from malware that has infected many computers. Scroogle has a description and some analysis at their site. This may be related to sites that sell fake antivirus software.
scroogle.org/botnote.html

You may want to check if your IP is on their blocked list. If found, you should check to see if your computer is infected, but beware of using antivirus software.
scroogle.org/ip2coun.txt

USA Today also has some related information.
content.usatoday.com/communiti ...
content.usatoday.com/communiti ...

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!