A movie explaining Google's feed infrastructure. It's a 'Google Confidential' video, so it might soon be deleted.
...and the video has been removed.
Even though Google deleted the video very quickly after it was posted here, I don't think the interesting facts would hurt anybody. They accidentally made them public on September 6th. So here are some notes from the video:
Google will work on a standard for feed publishers to tell aggegrators about changes in the feed ('this post has been deleted' etc.). Such a standard doesn't exist yet. They will be working with blog tools like Blogger and MovableType.
2/3 of the content has only one subscriber. Think about feeds for own-name-searches, own blogs and blog comments. There are feeds with up to tens of millions of subscribers. The crawl rate of feeds is prioritized when they've got more subscribers. They're updated within one hour when there's more than one subscriber, or else once in three hours.
The feed backend now contains 10 terabytes of raw data from 8 million feeds. The index size grows with 4% a week, but this number is probably not accurate.
Currently the standard distributed database called BigTable is mostly used. For search Mustang is currently used, Google's library for creating search engines. Mustang underlies the web search and most other search engines, except for Gmail's search feature as that requires instant updates and a specific index for each user. Mustang currently handles 1-2 search queries per second, but is able to handle thousands.
The Reader team is going to integrate more social features. Currently items can be sent to friends by email, and there are no plans for creating a Reader-inbox for that.
Google's recent big social effort is called Mocha-Mocha (or Mocka-Mocka?), and will become the infrastructure for all social stuff across all of their applications. As a part of this, a new feature called Activity Streams will be introduced or at least implemented in Reader this quarter. This will be comparable to Facebook's News Feed (Minifeed?) feature, and integrate Gmail's addressbook and contact list.
Also there will be some other Gmail and Orkut integration, but this might just mean there will be links to Reader.
Google is interested in allowing users to comment on items they share, but this currently isn't a priority.
Calling tags 'labels' is called 'kind of a historic accident and needlessly confusing'.
When you press the 'Mark all as read' button, Google remembers that you've 'read' all items between two timestamps. You can never uncheck the 'Mark as read' checkbox for those items.
Currently there is no plan to integrate Reader with Universal Search. This is because Universal Search doesn't provide its backends with user IDs (so Gmail results can't be shown either), and because it requires a lookup time of less than 1/4 second, which Reader cannot provide yet.
When searching in Reader, you may also get results from before you aren't subscribed to anymore, or from your friends' items. This is intentional, but by some users considered as a bug.
Three people are working on Reader's backend, and three plus one intern are working on the frontend.
Very soon, Reader will recommend feeds to the user, based on previous subscriptions and other Google activity.
Next week, Reader will be released in several languages. One month after that, it will be available in 40 languages.
According to FeedBurner statistics, Google Reader is the world's largest full-content reader. My Yahoo is the largest headline reader, bug also iGoogle is big. As Google has grown into the market, the usage of Bloglines hasn't really decreased much.
Reader has a loyal user base (based on pageviews per user), higher than any other product except for Gmail and Orkut. 70 % of the users use Firefox, so feed syndication is still mostly a geek thing.
Feeds are currently monetized by FeedBurner. Reader might be more directly monetized in the future, but Google wants to watch out showing ads next to other people's content. This is a problem with Google News too. They might do something like they did with the non-free Opera: show the content owners' ads in the interface when they're AdSense publishers. Google wants to make publishing full articles in feeds more interesting to webmasters by creating ways to monetize them.
Thanks Fanboy, sounds very interesting. How did you come across this video? Also feel free to contact me via email at infoblogoscoped.com .
I came across this by being subscribed to this search: video.google.com/videosearch?q ... (see RSS link on that page).
And good thing I kept a copy of the flv video file then (from browser cache, since they removed downloading to iPod option :( )
Anon, can you email me a copy of the FLV file please? infoblogoscoped.com
I came across the video through the 'engedu' feed too. It should have been posted on their private site (they ended with 'other talks @ talks'), but I guess video.google.com looked too similar to it. Which might cause problems too when other companies are going to use Google Apps' video hosting tool...
The video file is 153 MB so I can't send it by email. It didn't contain anything else interesting for the general public, I think, only some talk about the database implementation. Though it was nice to see how Googlers educate eachother. This talk was is more interesting than the press releases and the occasional posts in the public support groups.
Ionut explains what Activity Streams could mean and also posted the audio of the talk:
<<The new central place for social activities will create feeds for all or your events ("activity streams") and share them with your contacts, if you choose to do so.>>
<speculation>So, I guess a possible scenario in the future could be that you post a new Google Video, and someone you declared to be your friend would see a link to that video with your name in their activity stream control center; or, you post a new blog post, and they'd get the link in a daily email alert, and so on. Or you upload a new Picasa album and your friends will find a link to it. Perhaps an activity stream could also be exposed as semi-private (cryptic URL) RSS to subscribe in e.g. Google Reader.</speculation>
Update: Seems the correct spelling of Google's "socializing efforts" is "Maka-Maka". (Though as I mentioned something by the name of "Mocha" does exist within Google, but maybe it's something unrelated.)
A blogger would rather prefer readers to come to the original blog to write their comments. Otherwise, there's a good chance the small bloggers won't get any visitors (at least comments) from some of their more popular posts.
Also, a relavant post on my blog here place4ideas.blogspot.com/2007/ ...
Thats pretty sweet, I'm looking forward to the recommended feature
I live in Reader. I would love to comment on posts right in Reader. Also if I could expand and collapse the comments on a post without going to the website that would be killer. They need to speed up the Google shared items blog. That thing is slow.
>> Also if I could expand and collapse the comments on
>> a post without going to the website that would be killer
I dare say that's not going to be super popular with content owners until Reader helps to monetize the feeds :)
Has anyone really thought out how much power we are allowing Google to have?
Power is the corrupter. Good Intentions pave the way to Corruption.
any change we will get to see the video again?
TechMalaya & M1t: The video was made public by Google and it was public for some days (on a video sharing service which doesn't allow you to upload "copyrighted" videos, nonetheless) but I decided to not republish/ mirror it here at this time. It's not really secret at this point – I saw it being mentioned in other sources, and Ionut temporarily posted the full audio – but I think the post is good info on it even without the video. I think these reportings need to be handled on a per-case basis, and you need to check a lot of variables; like in how many places the video appeared, how blurred faces appearing in it are, how much it's personal-private stuff vs business-private stuff, how much the persons in it are persons of public interest (celebrities), what the conditions were under which the video were posted and what's the copyright attached to it (e.g. was it a video sharing service, did the video sharing service allow copyrighted video), and much much more. And then you can start looking at how much is fair, e.g. 5 seconds, 1 minute, the full video and so on. Again I think it depends on each case and it's also a matter of case-to-case balancing decisions, and it's also a question of how much time you want to invest in making the decision (do you want to talk it through with friends, do you want to hire a lawyer to check it, do you want to talk to the source company etc. etc.).
I can confirm its real, thats the back of my friend's head...