Indexing and mining free trees

Author

Chi, Yun ; Yang, Yirong ; Muntz, Richard R.

Author_Institution

Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA

fYear

2003

fDate

19-22 Nov. 2003

Firstpage

509

Lastpage

512

Abstract

Tree structures are used extensively in domains such as computational biology, pattern recognition, computer networks, and so on. We present an indexing technique for free trees and apply this indexing technique to the problem of mining frequent subtrees. We first define a novel representation, the canonical form, for rooted trees and extend the definition to free trees. We also introduce another concept, the canonical string, as a simpler representation for free trees in their canonical forms. We then apply our tree indexing technique to the frequent subtree mining problem and present FreeTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of free trees. We study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from two real applications: a dataset of chemical compounds and a dataset of Internet multicast trees.

Keywords

data mining; database indexing; tree data structures; trees (mathematics); FreeTreeMiner algorithm; Internet multicast trees dataset; canonical representation; chemical compound dataset; free tree indexing technique; rooted trees; subtrees mining; tree structures; Chemical compounds; Computational biology; Computer networks; Databases; Indexing; Internet; Multicast algorithms; Pattern recognition; Scalability; Tree data structures;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining, 2003. ICDM 2003. Third IEEE International Conference on

Print_ISBN

0-7695-1978-4

Type

conf

DOI

10.1109/ICDM.2003.1250964

Filename

1250964