Google Blogoscoped

Forum

Does anybody know: gmail compression?

Art-One [PersonRank 10]

Saturday, September 16, 2006
17 years ago2,739 views

In Gmail text that are quoted (or duplicated) are indicated and can be hidden. Is their any change that they are using this stuff to compress the data they are storing? It would be quite easy to replace that text with a pointer... I'm just wondering...

Sam Davyson [PersonRank 10]

17 years ago #

I don't think it does. As someone said when Gmail first came out the size of a message in terms of storage grows exponentially as all the previous quoted text tends to be included (since it is hidden).

We won't know until Google tells us how big each message is.

Haochi [PersonRank 10]

17 years ago #

Upon looking through the code, I found that there're two duplicated contents.

Art-One [PersonRank 10]

17 years ago #

Hoachi: what do you mean? Could you be more specific?

Art-One [PersonRank 10]

17 years ago #

Sam: even if Google tells us how big each message is, they could still internally use some compression techniques... So if then we couldn't tell besides of any inside information. Maybe a question for Matt?

Andrew Hitchcock [PersonRank 10]

17 years ago #

I think I've heard at Google talks that they don't try to figure out duplicated content.

However, depending on what system Gmail uses, there might be compression somewhere in the system. BigTable supports table compression (which would probably save a lot of space with all the text e-mails), but I think Gmail was out before BigTable.

Art-One [PersonRank 10]

17 years ago #

Andrew, you're right, this could save hughe amouts of space. Think only of all that spam that could be replaced with just one pointer...

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!