Title :
An improvement of missing value imputation in DNA microarray data using cluster-based LLS method
Author :
Keerin, Phimmarin ; Kurutach, Werasak ; Boongoen, Tossapon
Author_Institution :
Fac. of Inf. Sci. & Technol., Mahanakorn Univ. of Technol., Bangkok, Thailand
Abstract :
Gene expressions measured during a microarray experiment usually encounter the native problem of missing values. These are due to possible errors occurring in the primary experiments, image acquisition and interpretation processes. Leaving this unsolved may critically degrade the reliability of any consequent downstream analysis or medical application. Yet, a further study of microarray data may not be possible with many standard analysis methods that require a complete data set. This paper introduces a new method to impute missing values in microarray data. The proposed algorithm, CLLS impute, is an extension of local least squares imputation with local data clustering being incorporated for improved quality and efficiency. Gene expression data is typically represented as a matrix whose rows and columns corresponds to genes and experiments, respectively. CLLS kicks off by finding a complete dataset via the removal of rows with missing value(s). Then, gene clusters and their corresponding centroids are obtained by applying a clustering technique on the complete dataset. A set of similar genes of the target gene (with missing values) are those belonging to the cluster, whose centroid is the closest to the target. Having known this, the target gene is imputed by applying regression analysis with similar genes previously determined. Empirical evaluation with several published gene expression datasets suggest that the proposed technique performs better than the classical local least square method and recently developed techniques found in the literature.
Keywords :
DNA; bioinformatics; genetics; genomics; lab-on-a-chip; least mean squares methods; molecular biophysics; regression analysis; CLLS impution; DNA microarray data; cluster-based LLS method; gene expression dataset clustering; image acquisition; local least square method; local least squares imputation; medical application; missing value imputation; regression analysis; Algorithm design and analysis; Cancer; Clustering algorithms; Correlation; Gene expression; Regression analysis; Standards; clustering; imputation; microarray data; missing value; regression;
Conference_Titel :
Communications and Information Technologies (ISCIT), 2013 13th International Symposium on
Conference_Location :
Surat Thani
Print_ISBN :
978-1-4673-5578-0
DOI :
10.1109/ISCIT.2013.6645921