DocumentCode
3717298
Title
Parallel information fusion method for microarray data analysis
Author
Jun Meng;Rui Li;Jing Zhang
Author_Institution
School of Computer Science and Technology, Dalian University of Technology, Dalian, China
fYear
2015
Firstpage
1539
Lastpage
1544
Abstract
Classification of microarray data has always been a challenging task due to the enormous number of genes. Finding a small, closely related gene set to accurately classify disease cells is an important research problem. Integrating biological knowledge into genomic analysis to help to improve the interpretation of the results is an effective approach. In this paper, affinity propagation (AP) clustering algorithm is chosen to analyze the impact of the biological similarity on the results. We integrate GO semantic similarity into AP clustering for granule construction. Using MapReduce programming model, a parallel information fusion method is proposed. The process of similarity matrix construction and message passing in AP algorithm is parallelized using MapReduce. Parallel randomly directed hill climb ensemble pruning (RandomDHCEP) method based on MapReduce is introduced for ensemble pruning. An instance analysis represents the process of affinity propagation and ensemble pruning by using iterative MapReduce program. The proposed method can offer good scalability on large data with increasing number of nodes and it can also provide higher classification accuracy rather than using whole gene set for classification.
Keywords
"Clustering algorithms","Partitioning algorithms","Programming","Data models","Classification algorithms","Semantics","Algorithm design and analysis"
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/BigData.2015.7363917
Filename
7363917
Link To Document