• DocumentCode
    3261123
  • Title

    Efficient Reservoir Sampling for Transactional Data Streams

  • Author

    Dash, Manoranjan ; Ng, Willie

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
  • fYear
    2006
  • fDate
    Dec. 2006
  • Firstpage
    662
  • Lastpage
    666
  • Abstract
    Reservoir sampling maintains a sample that is a "sketch" of the whole data. Existing reservoir sampling methods introduced by J.S Vitter are based on simple random sampling. These algorithms work fine for larger sampling ratios but for small sampling ratios, their performance drops drastically. Note that for streaming data, it is quintessential that the sampling algorithm works efficiently particularly for a very small ratio because streaming data is potentially infinite in size. We proposed a distance based sampling (DSS) for transactional data streams. DSS is designed to produce samples that are "close" to the whole data. It assures the accuracy of the final sample even at very small sampling ratios. Experimental comparison between DSS algorithm and the existing reservoir sampling methods shows that DSS outperforms them significantly particularly for small sample ratios
  • Keywords
    data mining; sampling methods; transaction processing; distance based sampling; efficient reservoir sampling; random sampling; transactional data streams; Association rules; Data engineering; Data mining; Decision support systems; Histograms; Information systems; Maintenance engineering; Reservoirs; Sampling methods; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2702-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2006.68
  • Filename
    4063708