• DocumentCode
    2397761
  • Title

    BloomFlash: Bloom Filter on Flash-Based Storage

  • Author

    Debnath, Biplob ; Sengupta, Sudipta ; Li, Jin ; Lilja, David J. ; Du, David H C

  • Author_Institution
    EMC Corp., Santa Clara, CA, USA
  • fYear
    2011
  • fDate
    20-24 June 2011
  • Firstpage
    635
  • Lastpage
    644
  • Abstract
    The bloom filter is a probabilistic data structure that provides a compact representation of a set of elements. To keep false positive probabilities low, the size of the bloom filter must be dimensioned a priori to be linear in the maximum number of keys inserted, with the linearity constant ranging typically from one to few bytes. A bloom filter is most commonly used as an in memory data structure, hence its size is limited by the availability of RAM space on the machine. As datasets have grown over time to Internet scale, so have the RAM space requirements of bloom filters. If sufficient RAM space is not available, we advocate that flash memory may serve as a suitable medium for storing bloom filters, since it is about one-tenth the cost of RAM per GB while still providing access times orders of magnitude faster than hard disk. We present BLOOMFLASH, a bloom filter designed for flash memory based storage, that provides a new dimension of trade off with bloom filter access times to reduce RAM space usage (and hence system cost). The simple design of a single flat bloom filter on flash suffers from many performance bottlenecks, including in-place bit updates that are inefficient on flash and multiple reads and random writes spread out across many flash pages for a single lookup or insert operation. To mitigate these performance bottlenecks, BLOOMFLASH leverages two key design innovations: (i) buffering bit updates in RAM and applying them in bulk to flash that helps to reduce random writes to flash, and (ii) a hierarchical bloom filter design consisting of component bloom filters, stored one per flash page, that helps to localize reads and writes on flash. We use two real-world data traces taken from representative bloom filter applications to drive and evaluate our design. BLOOMFLASH achieves bloom filter access times in the range of few tens of microseconds, thus allowing up to order of tens of thousands operations per sec.
  • Keywords
    data structures; flash memories; random-access storage; storage management; BloomFlash; Internet scale; RAM space; RAM space usage reduction; buffering bit updates; flash memory based storage; flash-based storage; hierarchical bloom filter design; memory data structure; probabilistic data structure; Ash; Buffer storage; Data structures; Games; Performance evaluation; Random access memory; Servers; Bloom Filter; Data Center Applications; Flash Memory; Hierarchical Design; Solid State Disk (SSD);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems (ICDCS), 2011 31st International Conference on
  • Conference_Location
    Minneapolis, MN
  • ISSN
    1063-6927
  • Print_ISBN
    978-1-61284-384-1
  • Electronic_ISBN
    1063-6927
  • Type

    conf

  • DOI
    10.1109/ICDCS.2011.44
  • Filename
    5961740