• DocumentCode
    917990
  • Title

    Considerations of sample and feature size

  • Author

    Foley, Donald H.

  • Volume
    18
  • Issue
    5
  • fYear
    1972
  • fDate
    9/1/1972 12:00:00 AM
  • Firstpage
    618
  • Lastpage
    626
  • Abstract
    In many practical pattern-classification problems the underlying probability distributions are not completely known. Consequently, the classification logic must be determined on the basis of vector samples gathered for each class. Although it is common knowledge that the error rate on the design set is a biased estimate of the true error rate of the classifier, the amount of bias as a function of sample size per class and feature size has been an open question. In this paper, the design-set error rate for a two-class problem with multivariate normal distributions is derived as a function of the sample size per class (N) and dimensionality (L) . The design-set error rate is compared to both the corresponding Bayes error rate and the test-set error rate. It is demonstrated that the design-set error rate is an extremely biased estimate of either the Bayes or test-set error rate if the ratio of samples per class to dimensions (N/L) is less than three. Also the variance of the design-set error rate is approximated by a function that is bounded by 1/8N .
  • Keywords
    Pattern classification; Design optimization; Error analysis; Gaussian distribution; Logic design; Pattern analysis; Pattern recognition; Probability distribution; Statistical distributions; Testing;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.1972.1054863
  • Filename
    1054863