DocumentCode :
2977251
Title :
Evaluation of missing values imputation methods in cDNA microarrays based on classification accuracy
Author :
Ghoneim, Vidan Fathi ; Solouma, Nahed H. ; Kadah, Yasser M.
Author_Institution :
Biomed. Eng. Dept., Misr Univ. for Sci. & Technol., 6th of October City, Egypt
fYear :
2011
fDate :
21-24 Feb. 2011
Firstpage :
367
Lastpage :
370
Abstract :
Many attempts have been carried out to deal with missing values (MV) in microarrays data representing gene expressions. This is a problematic issue as many data analysis techniques are not robust to missing data. Most of the MV imputation methods currently being used have been evaluated only in terms of the similarity between the original and imputed data. While imputed expression values themselves are not interesting, rather whether or not the imputed expression values are reliable to use in subsequent analysis is the major concern. This paper focuses on studying the impact of different MV imputation methods on the classification accuracy. The experimental work was first subjected to implementing three popular imputation methods, namely Singular Value Decomposition (SVD), weighted K-nearest neighbors (KNNimpute), and Zero replacement. The robustness of the three methods to the amount of missing data was then studied. The experiments were repeated for datasets with different missing rates (MR) over the range of 0-20% MR. In applying supervised two class classification we adopted a twofold approach, introducing all genes expressions to the classifiers as well as a subset of selected genes. The feature selection method used for gene selection is Fisher Discriminate Analysis (FDA), which improved noticeably the performance of the classifiers. The retained classifiers accuracies using imputed data after applying the three proposed imputation methods show slight variations over the specified range of MR. Thus, assessing that the three imputation methods in concern are robust.
Keywords :
DNA; biology computing; genetics; lab-on-a-chip; molecular biophysics; singular value decomposition; Fisher discriminate analysis; Zero replacement; feature selection method; gene expressions; gene selection; in cDNA microarrays; missing values imputation methods; singular value decomposition; weighted K-nearest neighbors; Accuracy; Bioinformatics; Euclidean distance; Gene expression; Robustness; Sensitivity; Support vector machines; classification; evaluation; imputation; microarrays;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Biomedical Engineering (MECBME), 2011 1st Middle East Conference on
Conference_Location :
Sharjah
Print_ISBN :
978-1-4244-6998-7
Type :
conf
DOI :
10.1109/MECBME.2011.5752142
Filename :
5752142
Link To Document :
بازگشت