DocumentCode :
1202705
Title :
Mining closed and maximal frequent subtrees from databases of labeled rooted trees
Author :
Chi, Yun ; Xia, Yi ; Yang, Yirong ; Muntz, Richard R.
Author_Institution :
Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
Volume :
17
Issue :
2
fYear :
2005
Firstpage :
190
Lastpage :
202
Abstract :
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. Because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of frequent subtrees and, therefore, mining all frequent subtrees becomes infeasible for large tree sizes. We present CMTreeMiner, a computationally efficient algorithm that discovers only closed and maximal frequent subtrees in a database of labeled rooted trees, where the rooted trees can be either ordered or unordered. The algorithm mines both closed and maximal frequent subtrees by traversing an enumeration tree that systematically enumerates all frequent subtrees. Several techniques are proposed to prune the branches of the enumeration tree that do not correspond to closed or maximal frequent subtrees. Heuristic techniques are used to arrange the order of computation so that relatively expensive computation is avoided as much as possible. We study the performance of our algorithm through extensive experiments, using both synthetic data and data sets from real applications. The experimental results show that our algorithm is very efficient in reducing the search space and quickly discovers all closed and maximal frequent subtrees.
Keywords :
computational complexity; data mining; graph theory; heuristic programming; relational databases; tree data structures; CMTreeMiner; closed frequent subtrees; data set; database mining; heuristic technique; labeled rooted tree; maximal frequent subtrees; tree structure; tree traversing; Classification algorithms; Classification tree analysis; Clustering algorithms; Computer networks; Data mining; Databases; Explosions; Motion pictures; Tree graphs; XML;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2005.30
Filename :
1377171
Link To Document :
بازگشت