• DocumentCode
    3576374
  • Title

    A probabilistic condensed representation of data for stream mining

  • Author

    Geilke, Michael ; Karwath, Andreas ; Kramer, Stefan

  • Author_Institution
    Johannes Gutenberg-Univ. Mainz, Mainz, Germany
  • fYear
    2014
  • Firstpage
    297
  • Lastpage
    303
  • Abstract
    Data mining and machine learning algorithms usually operate directly on the data. However, if the data is not available at once or consists of billions of instances, these algorithms easily become infeasible with respect to memory and run-time concerns. As a solution to this problem, we propose a framework, called MiDEO (Mining Density Estimates inferred Online), in which algorithms are designed to operate on a condensed representation of the data. In particular, we propose to use density estimates, which are able to represent billions of instances in a compact form and can be updated when new instances arrive. As an example for an algorithm that operates on density estimates, we consider the task of mining association rules, which we consider as a form of simple statements about the data. The algorithm, called POEt (Pattern mining on Online density esTimates), is evaluated on synthetic and real-world data and is compared to state-of-the-art algorithms.
  • Keywords
    data mining; data structures; MiDEO; POEt; mining association rules; mining density estimates inferred online; pattern mining on online density estimates; probabilistic condensed data representation; stream mining; Algorithm design and analysis; Association rules; Inference algorithms; Itemsets; Machine learning algorithms; Probabilistic logic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Science and Advanced Analytics (DSAA), 2014 International Conference on
  • Type

    conf

  • DOI
    10.1109/DSAA.2014.7058088
  • Filename
    7058088