Title :
HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms
Author :
Chi, Yun ; Yang, Yirong ; Muntz, Richard R.
Author_Institution :
Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
Abstract :
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we present HybridTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of rooted unordered trees. The algorithm mines frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees. The enumeration tree is defined based on a novel canonical form for rooted unordered trees - the breadth-first canonical form (BFCF). By extending the definitions of our canonical form and enumeration tree to free trees, our algorithm can efficiently handle databases of free trees as well. We study the performance of our algorithms through extensive experiments based on both synthetic data and datasets from real applications. The experiments show that our algorithm is competitive in comparison to known rooted tree mining algorithms and is faster by one to two orders of magnitudes compared to a known algorithm for mining frequent free trees.
Keywords :
data mining; tree data structures; tree searching; HybridTreeMiner algorithm; XML databases; breadth-first canonical form; canonical forms; computational biology; computer networks; enumeration tree; free tree databases; frequent rooted tree mining; frequent subtree; pattern recognition; subtrees; tree isomorphism; tree mining algorithms; tree structures; unordered trees; Application software; Biology computing; Computational biology; Computer networks; Computer science; Databases; Pattern recognition; Tree data structures; Tree graphs; XML;
Conference_Titel :
Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on
Print_ISBN :
0-7695-2146-0
DOI :
10.1109/SSDM.2004.1311189