DocumentCode
719410
Title
Compressing Yahoo Mail
Author
Bergman, Aran ; Zohar, Eyal
Author_Institution
Dept. of Electr. Eng., Technion - Israel Inst. of Technol., Haifa, Israel
fYear
2015
fDate
7-9 April 2015
Firstpage
223
Lastpage
232
Abstract
Yahoo mail servers have been receiving an enormous number of messages each day for the past 17 years. The vast majority of today´s messages are machine-generated (about 90% of the messages), based on a boilerplate with a small number of specific per-recipient changes. We show that the popular Zlib compression to gzip format fails to fully utilize the high similarity between these machine-generated messages. In this paper we analyze the data redundancy in Yahoo mail, and present methods to reduce its space requirements while using the standard Zlib library. Our results show we can further reduce the compressed data size by a factor of almost 2.5, compared to traditional gzip compression.
Keywords
data compression; electronic mail; reliability; software libraries; Yahoo mail servers; Zlib compression; Zlib library; boilerplate; data redundancy; gzip compression; gzip format; machine-generated messages; specific per-recipient changes; Electronic mail; Libraries; Postal services; Redundancy; Servers; Size measurement; Standards; Compression; Deflate; Mail; Yahoo; Zlib; gzip;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference (DCC), 2015
Conference_Location
Snowbird, UT
ISSN
1068-0314
Type
conf
DOI
10.1109/DCC.2015.15
Filename
7149279
Link To Document