• DocumentCode
    984958
  • Title

    Classifying Data Streams with Skewed Class Distributions and Concept Drifts

  • Author

    Gao, Jing ; Ding, Bolin ; Fan, Wei ; Han, Jiawei ; Yu, Philip S.

  • Author_Institution
    Univ. of Illinois, Urbana, IL
  • Volume
    12
  • Issue
    6
  • fYear
    2008
  • Firstpage
    37
  • Lastpage
    49
  • Abstract
    Classification is an important data analysis tool that uses a model built from historical data to predict class labels for new observations. More and more applications are featuring data streams, rather than finite stored data sets, which are a challenge for traditional classification algorithms. Concept drifts and skewed distributions, two common properties of data stream applications, make the task of learning in streams difficult. The authors aim to develop a new approach to classify skewed data streams that uses an ensemble of models to match the distribution over under-samples of negatives and repeated samples of positives.
  • Keywords
    data analysis; pattern classification; concept drifts; data analysis tool; data streams classification; skewed distributions; Classification algorithms; Current distribution; Data analysis; Delay; Internet; Monitoring; Predictive models; Sampling methods; Telecommunication traffic; Traffic control; classification algorithms; concept drifts; data mining; data stream; model averaging; skewed distributions;
  • fLanguage
    English
  • Journal_Title
    Internet Computing, IEEE
  • Publisher
    ieee
  • ISSN
    1089-7801
  • Type

    jour

  • DOI
    10.1109/MIC.2008.119
  • Filename
    4670118