Title :
Data mining for case-based reasoning in high-dimensional biological domains
Author :
Arshadi, Niloofar ; Jurisica, Igor
Author_Institution :
Dept. of Comput. Sci., Toronto Univ., Ont., Canada
Abstract :
Case-based reasoning (CBR) is a suitable paradigm for class discovery in molecular biology, where the rules that define the domain knowledge are difficult to obtain and the number and the complexity of the rules affecting the problem are too large for formal knowledge representation. To extend the capabilities of CBR, we propose the mixture of experts for case-based reasoning (MOE4CBR), a method that combines an ensemble of CBR classifiers with spectral clustering and logistic regression. Our approach not only achieves higher prediction accuracy, but also leads to the selection of a subset of features that have meaningful relationships with their class labels. We evaluate MOE4CBR by applying the method to a CBR system called TA3 - a computational framework for CBR systems. For two ovarian mass spectrometry data sets, the prediction accuracy improves from 80 percent to 93 percent and from 90 percent to 98.4 percent, respectively. We also apply the method to leukemia and lung microarray data sets with prediction accuracy improving from 65 percent to 74 percent and from 60 percent to 70 percent, respectively. Finally, we compare our list of discovered biomarkers with the lists of selected biomarkers from other studies for the mass spectrometry data sets.
Keywords :
biology computing; case-based reasoning; data mining; knowledge representation; learning (artificial intelligence); molecular biophysics; pattern classification; pattern clustering; regression analysis; biomarker discovery; case-based reasoning; data mining; feature selection; formal knowledge representation; high-dimensional biological domains; leukemia; logistic regression; lung microarray data sets; machine learning; molecular biology; ovarian mass spectrometry data sets; spectral clustering; Accuracy; Biomarkers; Cancer; Clinical diagnosis; Data analysis; Data mining; Gene expression; Mass spectroscopy; Medical diagnostic imaging; Proteins; Index Terms- Machine learning; biomarker discovery.; case-based reasoning classifiers; clustering; data mining; feature selection; mass spectrometry data analysis; microarray data analysis;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2005.124