• DocumentCode
    3542720
  • Title

    Relationship between the accuracy of classifier error estimation and distribution complexity

  • Author

    Atashpaz-Gargari, Esmaeil ; Sima, Chao ; Braga-Neto, Ulisses M. ; Dougherty, Edward R.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
  • fYear
    2011
  • fDate
    4-6 Dec. 2011
  • Firstpage
    147
  • Lastpage
    149
  • Abstract
    Error estimation is a crucial part of any classification problem and it becomes problematic with small samples. In this paper, we analyze the performance of some widely used error estimation methods relative to the complexity of the feature-label distribution: resubstitution, 10-fold cross validation with repetition (CV10r), leave-one-out (LOO), bootstrap .632, and bolstered resubstitution. Our definition of complexity takes into account both the complexity of the Bayes decision surface and the Bayes error. We define the complexity of distribution for a class of Gaussian mixture models. In this class, the Bayes classifier is a piecewise linear classifier and its complexity is included in our definition. Based on the defined measure of complexity, we perform experiments for 2-dimensional and 3-dimensional problems and apply different error estimation methods for distributions of different complexities. The Bias and root-mean-squared (RMS) error of the error estimators are used to analyze their performances. The simulation results show that all the estimation methods lose accuracy as the complexity increases and this performance loss is quantified as a function of distribution complexity.
  • Keywords
    Bayes methods; Gaussian processes; computational complexity; mean square error methods; pattern classification; 10-fold cross validation with repetition; 2-dimensional dimensional problems; 3-dimensional problems; Bayes classifier; Bayes decision surface; Bayes error; Gaussian mixture models; bias error; bolstered resubstitution; classification problem; classifier error estimation accuracy; feature-label distribution complexity; leave-one-out bootstrap .632; piecewise linear classifier; root-mean-squared error; Bioinformatics; Complexity theory; Error analysis; Genomics; Hidden Markov models; Measurement uncertainty; Three dimensional displays;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Genomic Signal Processing and Statistics (GENSIPS), 2011 IEEE International Workshop on
  • Conference_Location
    San Antonio, TX
  • ISSN
    2150-3001
  • Print_ISBN
    978-1-4673-0491-7
  • Electronic_ISBN
    2150-3001
  • Type

    conf

  • DOI
    10.1109/GENSiPS.2011.6169466
  • Filename
    6169466