• DocumentCode
    2677755
  • Title

    Data organization and access for efficient data mining

  • Author

    Dunkel, Brian ; Soparkar, Nandit

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Michigan Univ., Ann Arbor, MI, USA
  • fYear
    1999
  • fDate
    23-26 Mar 1999
  • Firstpage
    522
  • Lastpage
    529
  • Abstract
    Efficient mining of data presents a significant challenge, due to problems of combinatorial explosion in the space and time often required for such processing. While previous work has focused on improving the efficiency of the mining algorithms, we consider how the representation, organization, and access of the data may significantly affect performance, especially when I/O costs are also considered. By a simple analysis and comparison of the counting stage for the a priori association rules algorithm, we show that a “column-wise” approach to data access is often more efficient than the standard row-wise approach. We also provide the results of empirical simulations to validate our analysis. The key idea in our approach is that counting in the a priori algorithm with data accessed in a column-wise manner, significantly reduces the number of disk accesses required to identify itemsets with a minimum support in the database-primarily by reducing the degree to which data and counters need to be repeatedly brought into memory
  • Keywords
    data handling; data mining; information retrieval; I/O costs; a priori association rules algorithm; combinatorial explosion; data access; data mining; data organization; disk accesses; itemsets; mining algorithms; standard row-wise approach; Algorithm design and analysis; Argon; Association rules; Computer science; Costs; Data mining; Delta modulation; Electrical capacitance tomography; Explosions; Itemsets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 1999. Proceedings., 15th International Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-0071-4
  • Type

    conf

  • DOI
    10.1109/ICDE.1999.754968
  • Filename
    754968