DocumentCode
2569955
Title
Missing value estimation for DNA microarray gene expression data with principal curves
Author
Shi, Jinlong ; Luo, Zhigang
Author_Institution
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
fYear
2010
fDate
16-18 April 2010
Firstpage
262
Lastpage
265
Abstract
Computing analysis of gene expression data has been an essential approach for understanding cellular activities and identifying gene function. However, expression profiles generated by the high-throughput microarray experiments often contain missing values, which significantly affect the performance of subsequent statistical analysis and machine learning algorithms. So there is a great need for estimating these missing values as accurately as possible. Although there have been many estimation algorithms, but each of them has its flaws. This paper proposes an estimation method for missing values based on principal curve which is a nonlinear generalization of the first linear principal component analysis. Through finding the self-consistent smooth one dimensional curves that pass through the `middle´ of a multidimensional data set, principal curve can integrate the linear and nonlinear relationships between genes, and reveal the distribution of genes. Based on the framework of all the expression profiles, missing values can be estimated more accurately. To assess the performance of the method, comparisons with recently proposed estimation algorithms are carried out on several microarray data sets. The results shows that our method provides a better solution for the estimation of missing values in DNA microarray gene expression data.
Keywords
DNA; bioinformatics; cellular biophysics; lab-on-a-chip; principal component analysis; DNA microarray gene expression; cellular activity; expression profile; gene function identification; linear principal component analysis; machine learning algorithm; missing value estimation; principal curve; Condition monitoring; DNA computing; Gene expression; Humans; Machine learning algorithms; Matrix decomposition; Multidimensional systems; Pattern analysis; Principal component analysis; Statistical analysis; estimation; microarray; missing value; principal curve;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedical Technology (ICBBT), 2010 International Conference on
Conference_Location
Chengdu
Print_ISBN
978-1-4244-6775-4
Type
conf
DOI
10.1109/ICBBT.2010.5478964
Filename
5478964
Link To Document