• Title of article

    Cost-sensitive classification with inadequate labeled data

  • Author/Authors

    Tao Wang، نويسنده , , Zhenxing Qin، نويسنده , , Shichao Zhang، نويسنده , , Chengqi Zhang، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2012
  • Pages
    9
  • From page
    508
  • To page
    516
  • Abstract
    It is an actual and challenging issue to learn cost-sensitive models from those datasets that are with few labeled data and plentiful unlabeled data, because some time labeled data are very difficult, time consuming and/or expensive to obtain. To solve this issue, in this paper we proposed two classification strategies to learn cost-sensitive classifier from training datasets with both labeled and unlabeled data, based on Expectation Maximization (EM). The first method, Direct-EM, uses EM to build a semi-supervised classifier, then directly computes the optimal class label for each test example using the class probability produced by the learning model. The second method, CS-EM, modifies EM by incorporating misclassification cost into the probability estimation process. We conducted extensive experiments to evaluate the efficiency, and results show that when using only a small number of labeled training examples, the CS-EM outperforms the other competing methods on majority of the selected UCI data sets across different cost ratios, especially when cost ratio is high.
  • Keywords
    Cost-sensitive learning , Classification , expectation maximization , semi-supervised learning
  • Journal title
    Information Systems
  • Serial Year
    2012
  • Journal title
    Information Systems
  • Record number

    1230269