In Gmail text that are quoted (or duplicated) are indicated and can be hidden. Is their any change that they are using this stuff to compress the data they are storing? It would be quite easy to replace that text with a pointer... I'm just wondering... |
I don't think it does. As someone said when Gmail first came out the size of a message in terms of storage grows exponentially as all the previous quoted text tends to be included (since it is hidden).
We won't know until Google tells us how big each message is. |
Upon looking through the code, I found that there're two duplicated contents. |
Hoachi: what do you mean? Could you be more specific? |
Sam: even if Google tells us how big each message is, they could still internally use some compression techniques... So if then we couldn't tell besides of any inside information. Maybe a question for Matt? |
I think I've heard at Google talks that they don't try to figure out duplicated content.
However, depending on what system Gmail uses, there might be compression somewhere in the system. BigTable supports table compression (which would probably save a lot of space with all the text e-mails), but I think Gmail was out before BigTable. |
Andrew, you're right, this could save hughe amouts of space. Think only of all that spam that could be replaced with just one pointer... |