• DocumentCode
    1784840
  • Title

    Semi-supervised imputation for microarray missing value estimation

  • Author

    Hui-Hui Li ; Feng-Feng Shao ; Guo-Zheng Li

  • Author_Institution
    Dept. of Control Sci. & Eng., Tongji Univ., Shanghai, China
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    297
  • Lastpage
    300
  • Abstract
    Data missing is a kind of inevitable phenomenon in gene expression microarray experiments due to many factors. The integrity of the data plays a key role in the performance of the downstream analysis. Therefore, many developments have been achieved in the research on estimating missing values. However, when it comes to missing data with a large missing rate, most current estimation methods cannot obtain a high estimation precision. In this paper, induced by the thought of semi-supervised learning with collaborative training, we propose a new imputation method called COIM (COllaborative IMputation). COIM estimates missing values using collaborative imputation strategy based on Bayesian principal component analysis (BPCA) and local least squares (LLS). It exploits global correlation information and local structure in the missing dataset, by sharing the estimated results with each other between BPCA and LLS. Furthermore, COIM uses tactics of recovering genes that have less missing entries first. Numerical results demonstrate that COIM is superior to the comparative algorithms in terms of normalized root mean square error (NRMSE), especially for the datasets with large missing rates or less complete genes.
  • Keywords
    Bayes methods; bioinformatics; data analysis; data integrity; genetic algorithms; genetics; learning (artificial intelligence); least mean squares methods; principal component analysis; BPCA; Bayesian principal component analysis; COIM; collaborative imputation strategy; collaborative training; data integrity; data missing; downstream analysis; gene expression microarray experiments; gene recovery; global correlation information; high-estimation precision; local least squares; microarray missing value estimation; normalized root mean square error; semisupervised imputation; semisupervised learning; Bayes methods; Bioinformatics; Collaboration; Correlation; Estimation; Gene expression; Least squares approximations; Microarray gene expression data; large missing rate; missing value imputation; semi-supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999172
  • Filename
    6999172