• DocumentCode
    2048923
  • Title

    Quick Estimation of Data Compression and De-duplication for Large Storage Systems

  • Author

    Constantinescu, Cornel ; Lu, Maohua

  • fYear
    2011
  • fDate
    21-24 June 2011
  • Firstpage
    98
  • Lastpage
    102
  • Abstract
    Many new storage systems provide some form of data reduction. In a recent paper we investigate how compression and de-duplication can be mixed in primary storage systems serving active data. In this paper we try to answer the question someone would ask before upgrading to a new, data reduction enabled storage server: how much storage savings the new system would offer for the data I have stored right now? We investigate methods to quickly estimate the storage savings potential of customary data reduction methods used in storage systems: compression and full file de-duplication on large scale storage systems. We show that the compression ratio achievable on a large storage system can be precisely estimated with just couple percents (worst case) of the work required to compress each file in the system. Also, we show that full file duplicates can be discovered very quickly with only 4% error (worst case) by a robust heuristic.
  • Keywords
    data compression; data reduction; file servers; storage management; compression ratio; data reduction; file deduplication; large storage system; primary storage system; quick data compression estimation; storage saving; storage server; Approximation methods; Data compression; Estimation; File systems; Portable computers; Sampling methods; Servers; Compressibility; Data storage systems; De-duplication; Estimation; Large-scale storage systems; Nonuniform sampling; Sampling methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression, Communications and Processing (CCP), 2011 First International Conference on
  • Conference_Location
    Palinuro
  • Print_ISBN
    978-1-4577-1458-0
  • Electronic_ISBN
    978-0-7695-4528-8
  • Type

    conf

  • DOI
    10.1109/CCP.2011.41
  • Filename
    6061008