• DocumentCode
    2219971
  • Title

    Distilling classification models from cross validation runs: an application to mass spectrometry

  • Author

    Kalousis, Alexandros ; Prados, Julien ; Sanchez, Jean-Charles ; Allard, Laure ; Hilario, Melanie

  • Author_Institution
    CSD, Geneva Univ., Switzerland
  • fYear
    2004
  • fDate
    15-17 Nov. 2004
  • Firstpage
    113
  • Lastpage
    119
  • Abstract
    We present work on a proteomics application. More specifically, from the domain of mass-spectrometry and the identification of biomarkers for stroke attacks. Mass-spectrometry based biomarker identification is an application that sets a number of challenges to the knowledge discovery process. We describe how we tackle them and present a number of machine learning experiments that we performed in order to identify the most suitable learning algorithm for the given problem. However working with real world applications one of the main issues apart from good classification performance is an indication of the factors that really determine the classification decision. Usually based on the results of a resampled-based performance estimation, e.g. cross validation, an algorithm is selected that will provide the operational classification model. On a next step the operational model should be constructed, nevertheless it is not obvious how this should be done since in resampled-based procedures a number of different models are created. We propose a method for linear classifiers that examines the different models produced with cross-validation. The method examines the stability of the models produced from the different training folds and combines them to provide a single model.
  • Keywords
    biology computing; data mining; learning (artificial intelligence); mass spectroscopy; pattern classification; proteins; support vector machines; SVM; biomarker identification; cross validation; cross validation runs; knowledge discovery process; machine learning experiments; mass spectrometry; operational classification model; proteomics application; resampled-based performance estimation; stroke attacks; Biological system modeling; Biomarkers; Chemistry; Laboratories; Machine learning; Machine learning algorithms; Mass spectroscopy; Proteins; Proteomics; Stability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on
  • ISSN
    1082-3409
  • Print_ISBN
    0-7695-2236-X
  • Type

    conf

  • DOI
    10.1109/ICTAI.2004.51
  • Filename
    1374177