• DocumentCode
    2723708
  • Title

    Predicting classifier performance with a small training set: Applications to computer-aided diagnosis and prognosis

  • Author

    Basavanhally, Ajay ; Doyle, Scott ; Madabhushi, Anant

  • Author_Institution
    Dept. of Biomed. Eng., State Univ. of New Jersey, Piscataway, NJ, USA
  • fYear
    2010
  • fDate
    14-17 April 2010
  • Firstpage
    229
  • Lastpage
    232
  • Abstract
    Selection of an appropriate classifier for computer-aided diagnosis (CAD) applications has typically been an ad hoc process. It is difficult to know a priori which classifier will yield high accuracies for a specific application, especially when well-annotated data for classifier training is scarce. In this study, we utilize an inverse power-law model of statistical learning to predict classifier performance when only limited amounts of annotated training data is available. The objectives of this study are to (a) predict classifier error in the context of different CAD problems when larger data cohorts become available, and (b) compare classifier performance and trends (both at the sample/patient level and at the pixel level) as additional data is accrued (such as in a clinical trial). In this paper we utilize a power law model to evaluate and compare various classifiers (Support Vector Machine (SVM), C4.5 decision tree, k-nearest neighbor) for four distinct CAD problems. The first two datasets deal with sample/patient-level classification for distinguishing between (1) high from low grade breast cancers and (2) high from low levels of lymphocytic infiltration in breast cancer specimens. The other two datasets are pixel-level classification problems for discriminating cancerous and non-cancerous regions on prostate (3) MRI and (4) histopathology. Our empirical results suggest that, given sufficient training data, SVMs tend to be the best classifiers. This was true for datasets (1), (2), and (3), while the C4.5 decision tree was the best classifier for dataset (4). Our results also suggest that results of classifier comparison made on small data cohorts should not be generalized as holding true when large amounts of data become available.
  • Keywords
    biological organs; cancer; decision trees; image classification; medical image processing; support vector machines; C4.5 decision tree; CAD; MRI; SVM; breast cancers; classifier; computer-aided diagnosis; histopathology; inverse power-law model; k-nearest neighbor; lymphocytic infiltration; prognosis; statistical learning; support vector machine; Application software; Breast cancer; Classification tree analysis; Computer aided diagnosis; Decision trees; Inverse problems; Predictive models; Support vector machine classification; Support vector machines; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Biomedical Imaging: From Nano to Macro, 2010 IEEE International Symposium on
  • Conference_Location
    Rotterdam
  • ISSN
    1945-7928
  • Print_ISBN
    978-1-4244-4125-9
  • Electronic_ISBN
    1945-7928
  • Type

    conf

  • DOI
    10.1109/ISBI.2010.5490373
  • Filename
    5490373