• DocumentCode
    1906271
  • Title

    An improvement of missing value imputation in DNA microarray data using cluster-based LLS method

  • Author

    Keerin, Phimmarin ; Kurutach, Werasak ; Boongoen, Tossapon

  • Author_Institution
    Fac. of Inf. Sci. & Technol., Mahanakorn Univ. of Technol., Bangkok, Thailand
  • fYear
    2013
  • fDate
    4-6 Sept. 2013
  • Firstpage
    559
  • Lastpage
    564
  • Abstract
    Gene expressions measured during a microarray experiment usually encounter the native problem of missing values. These are due to possible errors occurring in the primary experiments, image acquisition and interpretation processes. Leaving this unsolved may critically degrade the reliability of any consequent downstream analysis or medical application. Yet, a further study of microarray data may not be possible with many standard analysis methods that require a complete data set. This paper introduces a new method to impute missing values in microarray data. The proposed algorithm, CLLS impute, is an extension of local least squares imputation with local data clustering being incorporated for improved quality and efficiency. Gene expression data is typically represented as a matrix whose rows and columns corresponds to genes and experiments, respectively. CLLS kicks off by finding a complete dataset via the removal of rows with missing value(s). Then, gene clusters and their corresponding centroids are obtained by applying a clustering technique on the complete dataset. A set of similar genes of the target gene (with missing values) are those belonging to the cluster, whose centroid is the closest to the target. Having known this, the target gene is imputed by applying regression analysis with similar genes previously determined. Empirical evaluation with several published gene expression datasets suggest that the proposed technique performs better than the classical local least square method and recently developed techniques found in the literature.
  • Keywords
    DNA; bioinformatics; genetics; genomics; lab-on-a-chip; least mean squares methods; molecular biophysics; regression analysis; CLLS impution; DNA microarray data; cluster-based LLS method; gene expression dataset clustering; image acquisition; local least square method; local least squares imputation; medical application; missing value imputation; regression analysis; Algorithm design and analysis; Cancer; Clustering algorithms; Correlation; Gene expression; Regression analysis; Standards; clustering; imputation; microarray data; missing value; regression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications and Information Technologies (ISCIT), 2013 13th International Symposium on
  • Conference_Location
    Surat Thani
  • Print_ISBN
    978-1-4673-5578-0
  • Type

    conf

  • DOI
    10.1109/ISCIT.2013.6645921
  • Filename
    6645921