• DocumentCode
    2569955
  • Title

    Missing value estimation for DNA microarray gene expression data with principal curves

  • Author

    Shi, Jinlong ; Luo, Zhigang

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2010
  • fDate
    16-18 April 2010
  • Firstpage
    262
  • Lastpage
    265
  • Abstract
    Computing analysis of gene expression data has been an essential approach for understanding cellular activities and identifying gene function. However, expression profiles generated by the high-throughput microarray experiments often contain missing values, which significantly affect the performance of subsequent statistical analysis and machine learning algorithms. So there is a great need for estimating these missing values as accurately as possible. Although there have been many estimation algorithms, but each of them has its flaws. This paper proposes an estimation method for missing values based on principal curve which is a nonlinear generalization of the first linear principal component analysis. Through finding the self-consistent smooth one dimensional curves that pass through the `middle´ of a multidimensional data set, principal curve can integrate the linear and nonlinear relationships between genes, and reveal the distribution of genes. Based on the framework of all the expression profiles, missing values can be estimated more accurately. To assess the performance of the method, comparisons with recently proposed estimation algorithms are carried out on several microarray data sets. The results shows that our method provides a better solution for the estimation of missing values in DNA microarray gene expression data.
  • Keywords
    DNA; bioinformatics; cellular biophysics; lab-on-a-chip; principal component analysis; DNA microarray gene expression; cellular activity; expression profile; gene function identification; linear principal component analysis; machine learning algorithm; missing value estimation; principal curve; Condition monitoring; DNA computing; Gene expression; Humans; Machine learning algorithms; Matrix decomposition; Multidimensional systems; Pattern analysis; Principal component analysis; Statistical analysis; estimation; microarray; missing value; principal curve;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedical Technology (ICBBT), 2010 International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4244-6775-4
  • Type

    conf

  • DOI
    10.1109/ICBBT.2010.5478964
  • Filename
    5478964