• DocumentCode
    3166736
  • Title

    Sampling for Sequential Pattern Mining: From Static Databases to Data Streams

  • Author

    Raïssi, Chedy ; Poncelet, Pascal

  • Author_Institution
    LIRMM, Montpellier
  • fYear
    2007
  • fDate
    28-31 Oct. 2007
  • Firstpage
    631
  • Lastpage
    636
  • Abstract
    Sequential pattern mining is an active field in the domain of knowledge discovery. Recently, with the constant progress in hardware technologies, real-world databases tend to grow larger and the hypothesis that a database can be loaded into main-memory for sequential pattern mining purpose is no longer valid. Furthermore, the new model of data as a continuous and potentially infinite flow, known as data stream model, call for a pre-processing step to ease the mining operations. Since the database size is the most influential factor for mining algorithms we examine the use of sampling over static databases to get approximate mining results with an upper bound on the error rate. Moreover, we extend these sampling analysis and present an algorithm based on reservoir sampling to cope with sequential pattern mining over data streams. We demonstrate with empirical results that our sampling methods are efficient and that sequence mining remains accurate over static databases and data streams.
  • Keywords
    data mining; database management systems; sampling methods; data sampling; data streams; knowledge discovery; sequential pattern mining; static databases; Data mining; Error analysis; Hardware; Itemsets; Pattern analysis; Reservoirs; Sampling methods; Space technology; Transaction databases; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
  • Conference_Location
    Omaha, NE
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3018-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2007.82
  • Filename
    4470302