DocumentCode
2370267
Title
Indexing and mining free trees
Author
Chi, Yun ; Yang, Yirong ; Muntz, Richard R.
Author_Institution
Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
fYear
2003
fDate
19-22 Nov. 2003
Firstpage
509
Lastpage
512
Abstract
Tree structures are used extensively in domains such as computational biology, pattern recognition, computer networks, and so on. We present an indexing technique for free trees and apply this indexing technique to the problem of mining frequent subtrees. We first define a novel representation, the canonical form, for rooted trees and extend the definition to free trees. We also introduce another concept, the canonical string, as a simpler representation for free trees in their canonical forms. We then apply our tree indexing technique to the frequent subtree mining problem and present FreeTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of free trees. We study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from two real applications: a dataset of chemical compounds and a dataset of Internet multicast trees.
Keywords
data mining; database indexing; tree data structures; trees (mathematics); FreeTreeMiner algorithm; Internet multicast trees dataset; canonical representation; chemical compound dataset; free tree indexing technique; rooted trees; subtrees mining; tree structures; Chemical compounds; Computational biology; Computer networks; Databases; Indexing; Internet; Multicast algorithms; Pattern recognition; Scalability; Tree data structures;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN
0-7695-1978-4
Type
conf
DOI
10.1109/ICDM.2003.1250964
Filename
1250964
Link To Document