DocumentCode :
2370267
Title :
Indexing and mining free trees
Author :
Chi, Yun ; Yang, Yirong ; Muntz, Richard R.
Author_Institution :
Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
fYear :
2003
fDate :
19-22 Nov. 2003
Firstpage :
509
Lastpage :
512
Abstract :
Tree structures are used extensively in domains such as computational biology, pattern recognition, computer networks, and so on. We present an indexing technique for free trees and apply this indexing technique to the problem of mining frequent subtrees. We first define a novel representation, the canonical form, for rooted trees and extend the definition to free trees. We also introduce another concept, the canonical string, as a simpler representation for free trees in their canonical forms. We then apply our tree indexing technique to the frequent subtree mining problem and present FreeTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of free trees. We study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from two real applications: a dataset of chemical compounds and a dataset of Internet multicast trees.
Keywords :
data mining; database indexing; tree data structures; trees (mathematics); FreeTreeMiner algorithm; Internet multicast trees dataset; canonical representation; chemical compound dataset; free tree indexing technique; rooted trees; subtrees mining; tree structures; Chemical compounds; Computational biology; Computer networks; Databases; Indexing; Internet; Multicast algorithms; Pattern recognition; Scalability; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
Type :
conf
DOI :
10.1109/ICDM.2003.1250964
Filename :
1250964
Link To Document :
بازگشت