DocumentCode
3673657
Title
Observing the Effect of the Choice of Classifier on Bioinformatics Data with Varying Levels of Data Quality and Class Balance
Author
Alireza Fazelpour;Taghi M. Khoshgoftaar;David J. Dittman;Ahmad Abu Shanab
Author_Institution
Florida Atlantic Univ., Boca Raton, FL, USA
fYear
2015
Firstpage
372
Lastpage
379
Abstract
Noise is a prominent challenge found in many bioinformatics datasets and it refers to erroneous or missing data. The presence of noise in gene expression datasets has adverse effects on machine-learning techniques, such as supervised classification algorithms and feature selection techniques. Additionally, the identification of noise and its quantification are challenging tasks that require a proper mechanism to manage them in order to improve the performance of classifiers and feature selection methods. In this study, our motivation is to investigate the effects of class noise on the classification performance of various learners using multiple derived datasets with varying degrees of data quality and class imbalance. Class imbalance is another challenging characteristic that occurs when one class has many more instances than the other class(es). To this end, we conducted experiments using a filter-based subset selection method applied to multiple derived datasets generated by injecting artificial class noise in a controlled manner creating three levels of data quality: High-Quality, Average-Quality, and Low-Quality. Our results along with statistical analysis show that Random Forest outperforms other learners without any exceptions for all levels of balance and data quality. Therefore, we recommend using Random Forest as the noise-tolerant and robust classifier when dealing with varying degrees of quality for bioinformatics datasets.
Keywords
"Noise","Bioinformatics","Data models","Biological system modeling","Training","Robustness","Vegetation"
Publisher
ieee
Conference_Titel
Information Reuse and Integration (IRI), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/IRI.2015.63
Filename
7301001
Link To Document