• DocumentCode
    1202705
  • Title

    Mining closed and maximal frequent subtrees from databases of labeled rooted trees

  • Author

    Chi, Yun ; Xia, Yi ; Yang, Yirong ; Muntz, Richard R.

  • Author_Institution
    Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
  • Volume
    17
  • Issue
    2
  • fYear
    2005
  • Firstpage
    190
  • Lastpage
    202
  • Abstract
    Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. Because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of frequent subtrees and, therefore, mining all frequent subtrees becomes infeasible for large tree sizes. We present CMTreeMiner, a computationally efficient algorithm that discovers only closed and maximal frequent subtrees in a database of labeled rooted trees, where the rooted trees can be either ordered or unordered. The algorithm mines both closed and maximal frequent subtrees by traversing an enumeration tree that systematically enumerates all frequent subtrees. Several techniques are proposed to prune the branches of the enumeration tree that do not correspond to closed or maximal frequent subtrees. Heuristic techniques are used to arrange the order of computation so that relatively expensive computation is avoided as much as possible. We study the performance of our algorithm through extensive experiments, using both synthetic data and data sets from real applications. The experimental results show that our algorithm is very efficient in reducing the search space and quickly discovers all closed and maximal frequent subtrees.
  • Keywords
    computational complexity; data mining; graph theory; heuristic programming; relational databases; tree data structures; CMTreeMiner; closed frequent subtrees; data set; database mining; heuristic technique; labeled rooted tree; maximal frequent subtrees; tree structure; tree traversing; Classification algorithms; Classification tree analysis; Clustering algorithms; Computer networks; Data mining; Databases; Explosions; Motion pictures; Tree graphs; XML;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.30
  • Filename
    1377171