• DocumentCode
    719410
  • Title

    Compressing Yahoo Mail

  • Author

    Bergman, Aran ; Zohar, Eyal

  • Author_Institution
    Dept. of Electr. Eng., Technion - Israel Inst. of Technol., Haifa, Israel
  • fYear
    2015
  • fDate
    7-9 April 2015
  • Firstpage
    223
  • Lastpage
    232
  • Abstract
    Yahoo mail servers have been receiving an enormous number of messages each day for the past 17 years. The vast majority of today´s messages are machine-generated (about 90% of the messages), based on a boilerplate with a small number of specific per-recipient changes. We show that the popular Zlib compression to gzip format fails to fully utilize the high similarity between these machine-generated messages. In this paper we analyze the data redundancy in Yahoo mail, and present methods to reduce its space requirements while using the standard Zlib library. Our results show we can further reduce the compressed data size by a factor of almost 2.5, compared to traditional gzip compression.
  • Keywords
    data compression; electronic mail; reliability; software libraries; Yahoo mail servers; Zlib compression; Zlib library; boilerplate; data redundancy; gzip compression; gzip format; machine-generated messages; specific per-recipient changes; Electronic mail; Libraries; Postal services; Redundancy; Servers; Size measurement; Standards; Compression; Deflate; Mail; Yahoo; Zlib; gzip;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2015
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2015.15
  • Filename
    7149279