DocumentCode :
243655
Title :
Mining Top-K Frequent Closed Patterns from Gene Expression Data
Author :
Shufan Ji ; Xuejiao Wang ; Yi Zong ; Xiaopeng Gao
Author_Institution :
Comput. Collegue, Beihang Univ., Beijing, China
fYear :
2014
fDate :
14-14 Dec. 2014
Firstpage :
732
Lastpage :
739
Abstract :
Analyzing microarray gene expression data provides biologists deep insights into gene functions and gene regulatory network. In this paper, we propose a novel efficient algorithm FCPminer to mine top-k frequent closed patterns (FCPs) of higher support with length no less than minL from gene expression data. FCPminer employs a prefix fp-tree data structure, with top-down best first search strategy, such that FCPs of adequate length with highest supports are firstly mined. Compared with existing top-k FCP mining algorithms, FCPminer is much more efficient as it avoids expanding nodes with inadequate length (less than minL) or low support (ranked below top-k) during mining process. In addition, FCPminer further improves mining efficiency by employing a hash-based closedness checking method. Experimental results on real biological and synthetic data show that our proposed FCPminer outperforms existing state-of the art algorithms with high efficiency, especially for large and dense datasets.
Keywords :
bioinformatics; data mining; file organisation; genetics; trees (mathematics); FCPminer; hash-based closedness checking method; microarray gene expression data; prefix fp-tree; top-k frequent closed pattern mining; Buildings; Complexity theory; Data mining; Gene expression; Itemsets; Search problems; Space exploration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4275-6
Type :
conf
DOI :
10.1109/ICDMW.2014.61
Filename :
7022668
Link To Document :
بازگشت