• DocumentCode
    3060154
  • Title

    Learning with limited minority class data

  • Author

    Khoshgoftaar, Taghi M. ; Seiffert, Chris ; Hulse, Jason Van ; Napolitano, Amri ; Folleco, Andres

  • Author_Institution
    Florida Atlantic Univ., Boca Raton
  • fYear
    2007
  • fDate
    13-15 Dec. 2007
  • Firstpage
    348
  • Lastpage
    353
  • Abstract
    A practical problem in data mining and machine learning is the limited availability of data. For example, in a binary classification problem it is often the case that examples of one class are abundant, while examples of the other class are in short supply. Examples from one class, typically the positive class, can be limited due to the financial cost or time required to collect these examples. This work presents a comprehensive empirical study of learning when examples from one class are extremely rare, but examples of the other class(es) are plentiful. Specifically, we address the issue of how many examples from the abundant class should be used when training a classifier on data where one class is very rare. Nearly one million classifiers were built and evaluated to generate the results presented in this work. Our results demonstrate that the often used ´even distribution´ is not optimal when dealing with such rare events.
  • Keywords
    classification; data handling; data mining; learning (artificial intelligence); binary classification; data classifier; data mining; machine learning; minority class data; Analysis of variance; Costs; Data mining; Decision trees; Machine learning; Measurement; Performance evaluation; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
  • Conference_Location
    Cincinnati, OH
  • Print_ISBN
    978-0-7695-3069-7
  • Type

    conf

  • DOI
    10.1109/ICMLA.2007.76
  • Filename
    4457255