DocumentCode
3269013
Title
Unordered tree mining with applications to phylogeny
Author
Shasha, Dennis ; Wang, Jason T L ; Zhang, Sen
Author_Institution
Courant Inst. of Math. Sci., New York Univ., NY, USA
fYear
2004
fDate
30 March-2 April 2004
Firstpage
708
Lastpage
719
Abstract
Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. We present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).
Keywords
data mining; graph theory; pattern recognition; tree data structures; Web log analysis; XML processing; bioinformatics; co-occurring pattern; cousin pair; free tree; frequent structure mining; kernel tree; multiple evolutionary tree; pattern discovery; pattern extraction; phylogeny; rooted unordered labeled trees; structural data; undirected acyclic graph; unordered tree mining; Bioinformatics; Data mining; Educational institutions; History; Kernel; Organisms; Phylogeny; Scalability; Tree graphs; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2004. Proceedings. 20th International Conference on
ISSN
1063-6382
Print_ISBN
0-7695-2065-0
Type
conf
DOI
10.1109/ICDE.2004.1320039
Filename
1320039
Link To Document