DocumentCode :
1906271
Title :
An improvement of missing value imputation in DNA microarray data using cluster-based LLS method
Author :
Keerin, Phimmarin ; Kurutach, Werasak ; Boongoen, Tossapon
Author_Institution :
Fac. of Inf. Sci. & Technol., Mahanakorn Univ. of Technol., Bangkok, Thailand
fYear :
2013
fDate :
4-6 Sept. 2013
Firstpage :
559
Lastpage :
564
Abstract :
Gene expressions measured during a microarray experiment usually encounter the native problem of missing values. These are due to possible errors occurring in the primary experiments, image acquisition and interpretation processes. Leaving this unsolved may critically degrade the reliability of any consequent downstream analysis or medical application. Yet, a further study of microarray data may not be possible with many standard analysis methods that require a complete data set. This paper introduces a new method to impute missing values in microarray data. The proposed algorithm, CLLS impute, is an extension of local least squares imputation with local data clustering being incorporated for improved quality and efficiency. Gene expression data is typically represented as a matrix whose rows and columns corresponds to genes and experiments, respectively. CLLS kicks off by finding a complete dataset via the removal of rows with missing value(s). Then, gene clusters and their corresponding centroids are obtained by applying a clustering technique on the complete dataset. A set of similar genes of the target gene (with missing values) are those belonging to the cluster, whose centroid is the closest to the target. Having known this, the target gene is imputed by applying regression analysis with similar genes previously determined. Empirical evaluation with several published gene expression datasets suggest that the proposed technique performs better than the classical local least square method and recently developed techniques found in the literature.
Keywords :
DNA; bioinformatics; genetics; genomics; lab-on-a-chip; least mean squares methods; molecular biophysics; regression analysis; CLLS impution; DNA microarray data; cluster-based LLS method; gene expression dataset clustering; image acquisition; local least square method; local least squares imputation; medical application; missing value imputation; regression analysis; Algorithm design and analysis; Cancer; Clustering algorithms; Correlation; Gene expression; Regression analysis; Standards; clustering; imputation; microarray data; missing value; regression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications and Information Technologies (ISCIT), 2013 13th International Symposium on
Conference_Location :
Surat Thani
Print_ISBN :
978-1-4673-5578-0
Type :
conf
DOI :
10.1109/ISCIT.2013.6645921
Filename :
6645921
Link To Document :
بازگشت