• Title of article

    Ambiguity measure feature-selection algorithm

  • Author/Authors

    Saket S.R. Mengle، نويسنده , , and Nazli Goharian، نويسنده ,

  • Issue Information
    ماهنامه با شماره پیاپی سال 2009
  • Pages
    14
  • From page
    1037
  • To page
    1050
  • Abstract
    With the increasing number of digital documents, the ability to automatically classify those documents both efficiently and accurately is becoming more critical and difficult. One of the major problems in text classification is the high dimensionality of feature space. We present the ambiguity measure (AM) feature-selection algorithm, which selects the most unambiguous features from the feature set. Unambiguous features are those features whose presence in a document indicate a strong degree of confidence that a document belongs to only one specific category. We apply AM feature selection on a naïve Bayes text classifier. We favorably show the effectiveness of our approach in outperforming eight existing feature-selection methods, using five benchmark datasets with a statistical significance of at least 95% confidence. The support vector machine (SVM) text classifier is shown to perform consistently better than the naïve Bayes text classifier. The drawback, however, is the time complexity in training a model. We further explore the effect of using the AM feature-selection method on an SVM text classifier. Our results indicate that the training time for the SVM algorithm can be reduced by more than 50%, while still improving the accuracy of the text classifier. We favorably show the effectiveness of our approach by demonstrating that it statistically significantly (99% confidence) outperforms eight existing feature-selection methods using four standard benchmark datasets.
  • Journal title
    Journal of the American Society for Information Science and Technology
  • Serial Year
    2009
  • Journal title
    Journal of the American Society for Information Science and Technology
  • Record number

    993972