Author :
Hussein, Marwa M A ; Soliman, Taysir H A ; Karam, Omar H.
Abstract :
Frequent tree mining has great uses in many domains employing tree structures; e.g. bioinformatics, text and Web mining. Many challenges were tackled to adapt frequent pattern mining techniques; to fit into the tree structure. Previous studies proved that pattern growth methods are more efficient than candidate generation methods using join functions. In the current work, an efficient pattern growth algorithm, guided-pattem growth, GP-growth, is introduced for discovering frequent embedded subtrees from a collection of labeled, rooted, and ordered trees. GP-growth is based on frequent pattern growth methodology that uses the input trees model as a guide to generate candidates. All frequent subtrees are efficiently discovered without duplication or generation of invalid candidates. GP-growth is compared to the TreeMiner algorithm - a tree mining algorithm that uFrequent tree mining has great uses in many domains employing tree structures; e.g. bioinformatics, text and web mining. Many challenges were tackled to adapt frequent pattern mining techniques; to fit into the tree structure. Previous studies proved that Pattern growth methods are more efficient than candidate generation methods using join functions. In the current work, an efficient pattern growth algorithm, guided-pattem growth, GP-growth, is introduced for discovering frequent embedded subtrees from a collection of labeled, rooted, and ordered trees. GP-growth is based on frequent pattern growth methodology that uses the input trees model as a guide to generate candidates. All frequent subtrees are efficiently discovered without duplication or generation of invalid candidates. GP-growth is compared to the TreeMiner algorithm - a tree mining algorithm that uses join function. Experiments show that GP-growth can find all frequent subtrees while generating fewer candidates. GP-growth outperforms TreeMiner by an average order of magnitude 2.ses join function. Experiments show that GP-growth can find all freque- nt subtrees while generating fewer candidates. GP-growth outperforms TreeMiner by an average order of magnitude 2.
Keywords :
data mining; trees (mathematics); frequent embedded subtree mining; frequent pattern growth method; frequent pattern mining; guided pattern-growth; tree structures; Bioinformatics; Computer networks; Data mining; Data models; Databases; Embedded computing; Tree data structures; Tree graphs; Web mining; XML;