• DocumentCode
    2370267
  • Title

    Indexing and mining free trees

  • Author

    Chi, Yun ; Yang, Yirong ; Muntz, Richard R.

  • Author_Institution
    Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
  • fYear
    2003
  • fDate
    19-22 Nov. 2003
  • Firstpage
    509
  • Lastpage
    512
  • Abstract
    Tree structures are used extensively in domains such as computational biology, pattern recognition, computer networks, and so on. We present an indexing technique for free trees and apply this indexing technique to the problem of mining frequent subtrees. We first define a novel representation, the canonical form, for rooted trees and extend the definition to free trees. We also introduce another concept, the canonical string, as a simpler representation for free trees in their canonical forms. We then apply our tree indexing technique to the frequent subtree mining problem and present FreeTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of free trees. We study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from two real applications: a dataset of chemical compounds and a dataset of Internet multicast trees.
  • Keywords
    data mining; database indexing; tree data structures; trees (mathematics); FreeTreeMiner algorithm; Internet multicast trees dataset; canonical representation; chemical compound dataset; free tree indexing technique; rooted trees; subtrees mining; tree structures; Chemical compounds; Computational biology; Computer networks; Databases; Indexing; Internet; Multicast algorithms; Pattern recognition; Scalability; Tree data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
  • Print_ISBN
    0-7695-1978-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2003.1250964
  • Filename
    1250964