• DocumentCode
    3426554
  • Title

    Maintaining only frequent itemsets to mine approximate frequent itemsets over online data streams

  • Author

    Wang, Yongyan ; Li, Kun ; Wang, Hongan

  • Author_Institution
    Intell. Eng. Lab., Chinese Acad. of Sci., Beijing
  • fYear
    2009
  • fDate
    March 30 2009-April 2 2009
  • Firstpage
    381
  • Lastpage
    388
  • Abstract
    Mining frequent itemsets over online data streams, where the new data arrive and the old data will be removed with high speed, is a challenge for the computational complexity. Existing approximate mining algorithms suffer from explosive computational complexity when decreasing the error parameter, isin, which is used to control the mining accuracy. We propose a new approximate mining algorithm using an approximate frequent itemset tree (abbreviated as AFI-tree), called AFI algorithm, to mine approximate frequent itemsets over online data streams. The AFI-tree based on prefix tree maintains only frequent itemsets, so the number of nodes in the tree is very small. All the infrequent child nodes of any frequent node are pruned and the maximal support of the pruned nodes is estimated to detect new frequent itemsets. In order to guarantee the mining accuracy, when the estimated maximal support of the pruned nodes is a bit more than the minimum support, their supports will be re-computed and the frequent nodes among them will be inserted into the AFI-tree. Experimental results show that the AFI algorithm consumes much less memory space than existing algorithms, and runs much faster than existing algorithms in most occasions.
  • Keywords
    approximation theory; computational complexity; data mining; trees (mathematics); AFI-tree; approximate frequent itemset mining algorithm; computational complexity; data mining; online data stream; prefix tree; Computational complexity; Data mining; Databases; Degradation; Error correction; Explosives; Financial management; Itemsets; Monitoring; Telecommunication network management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Data Mining, 2009. CIDM '09. IEEE Symposium on
  • Conference_Location
    Nashville, TN
  • Print_ISBN
    978-1-4244-2765-9
  • Type

    conf

  • DOI
    10.1109/CIDM.2009.4938675
  • Filename
    4938675