• DocumentCode
    2848417
  • Title

    Index support for frequent itemset mining in a relational DBMS

  • Author

    Baralis, Elena ; Cerquitelli, Tania ; Chiusano, Silvia

  • Author_Institution
    Dipt. di Autom. e Inf., Politecnico di Torino, Italy
  • fYear
    2005
  • fDate
    5-8 April 2005
  • Firstpage
    754
  • Lastpage
    765
  • Abstract
    Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.
  • Keywords
    SQL; data mining; database indexing; relational databases; tree data structures; PostgreSQL open source DBMS; data mining; data representation; frequent itemset mining index support; indexing technique; relational DBMS; tree data structures; Algorithm design and analysis; Buffer storage; Data analysis; Data mining; Data structures; Indexing; Itemsets; Kernel; Knowledge management; Relational databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on
  • ISSN
    1084-4627
  • Print_ISBN
    0-7695-2285-8
  • Type

    conf

  • DOI
    10.1109/ICDE.2005.80
  • Filename
    1410190