• DocumentCode
    983816
  • Title

    Data mining for case-based reasoning in high-dimensional biological domains

  • Author

    Arshadi, Niloofar ; Jurisica, Igor

  • Author_Institution
    Dept. of Comput. Sci., Toronto Univ., Ont., Canada
  • Volume
    17
  • Issue
    8
  • fYear
    2005
  • Firstpage
    1127
  • Lastpage
    1137
  • Abstract
    Case-based reasoning (CBR) is a suitable paradigm for class discovery in molecular biology, where the rules that define the domain knowledge are difficult to obtain and the number and the complexity of the rules affecting the problem are too large for formal knowledge representation. To extend the capabilities of CBR, we propose the mixture of experts for case-based reasoning (MOE4CBR), a method that combines an ensemble of CBR classifiers with spectral clustering and logistic regression. Our approach not only achieves higher prediction accuracy, but also leads to the selection of a subset of features that have meaningful relationships with their class labels. We evaluate MOE4CBR by applying the method to a CBR system called TA3 - a computational framework for CBR systems. For two ovarian mass spectrometry data sets, the prediction accuracy improves from 80 percent to 93 percent and from 90 percent to 98.4 percent, respectively. We also apply the method to leukemia and lung microarray data sets with prediction accuracy improving from 65 percent to 74 percent and from 60 percent to 70 percent, respectively. Finally, we compare our list of discovered biomarkers with the lists of selected biomarkers from other studies for the mass spectrometry data sets.
  • Keywords
    biology computing; case-based reasoning; data mining; knowledge representation; learning (artificial intelligence); molecular biophysics; pattern classification; pattern clustering; regression analysis; biomarker discovery; case-based reasoning; data mining; feature selection; formal knowledge representation; high-dimensional biological domains; leukemia; logistic regression; lung microarray data sets; machine learning; molecular biology; ovarian mass spectrometry data sets; spectral clustering; Accuracy; Biomarkers; Cancer; Clinical diagnosis; Data analysis; Data mining; Gene expression; Mass spectroscopy; Medical diagnostic imaging; Proteins; Index Terms- Machine learning; biomarker discovery.; case-based reasoning classifiers; clustering; data mining; feature selection; mass spectrometry data analysis; microarray data analysis;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.124
  • Filename
    1458705