DocumentCode :
2080057
Title :
GP-Growth: A New Algorithm for Mining Frequent Embedded Subtrees
Author :
Hussein, Marwa M A ; Soliman, Taysir H A ; Karam, Omar H.
Author_Institution :
Ain Shams Univ., Cairo
fYear :
2007
fDate :
1-4 July 2007
Firstpage :
1013
Lastpage :
1020
Abstract :
Frequent tree mining has great uses in many domains employing tree structures; e.g. bioinformatics, text and Web mining. Many challenges were tackled to adapt frequent pattern mining techniques; to fit into the tree structure. Previous studies proved that pattern growth methods are more efficient than candidate generation methods using join functions. In the current work, an efficient pattern growth algorithm, guided-pattem growth, GP-growth, is introduced for discovering frequent embedded subtrees from a collection of labeled, rooted, and ordered trees. GP-growth is based on frequent pattern growth methodology that uses the input trees model as a guide to generate candidates. All frequent subtrees are efficiently discovered without duplication or generation of invalid candidates. GP-growth is compared to the TreeMiner algorithm - a tree mining algorithm that uFrequent tree mining has great uses in many domains employing tree structures; e.g. bioinformatics, text and web mining. Many challenges were tackled to adapt frequent pattern mining techniques; to fit into the tree structure. Previous studies proved that Pattern growth methods are more efficient than candidate generation methods using join functions. In the current work, an efficient pattern growth algorithm, guided-pattem growth, GP-growth, is introduced for discovering frequent embedded subtrees from a collection of labeled, rooted, and ordered trees. GP-growth is based on frequent pattern growth methodology that uses the input trees model as a guide to generate candidates. All frequent subtrees are efficiently discovered without duplication or generation of invalid candidates. GP-growth is compared to the TreeMiner algorithm - a tree mining algorithm that uses join function. Experiments show that GP-growth can find all frequent subtrees while generating fewer candidates. GP-growth outperforms TreeMiner by an average order of magnitude 2.ses join function. Experiments show that GP-growth can find all freque- nt subtrees while generating fewer candidates. GP-growth outperforms TreeMiner by an average order of magnitude 2.
Keywords :
data mining; trees (mathematics); frequent embedded subtree mining; frequent pattern growth method; frequent pattern mining; guided pattern-growth; tree structures; Bioinformatics; Computer networks; Data mining; Data models; Databases; Embedded computing; Tree data structures; Tree graphs; Web mining; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computers and Communications, 2007. ISCC 2007. 12th IEEE Symposium on
Conference_Location :
Aveiro
ISSN :
1530-1346
Print_ISBN :
978-1-4244-1520-5
Electronic_ISBN :
1530-1346
Type :
conf
DOI :
10.1109/ISCC.2007.4381548
Filename :
4381548
Link To Document :
بازگشت