• Title of article

    A Bayesian feature selection paradigm for text classification

  • Author/Authors

    Guozhong Feng، نويسنده , , Jianhua Guo، نويسنده , , Bing-Yi Jing، نويسنده , , Lizhu Hao، نويسنده ,

  • Issue Information
    دوماهنامه با شماره پیاپی سال 2012
  • Pages
    20
  • From page
    283
  • To page
    302
  • Abstract
    The automated classification of texts into predefined categories has witnessed a booming interest, due to the increased availability of documents in digital form and the ensuing need to organize them. An important problem for text classification is feature selection, whose goals are to improve classification effectiveness, computational efficiency, or both. Due to categorization unbalancedness and feature sparsity in social text collection, filter methods may work poorly. In this paper, we perform feature selection in the training process, automatically selecting the best feature subset by learning, from a set of preclassified documents, the characteristics of the categories. We propose a generative probabilistic model, describing categories by distributions, handling the feature selection problem by introducing a binary exclusion/inclusion latent vector, which is updated via an efficient Metropolis search. Real-life examples illustrate the effectiveness of the approach.
  • Keywords
    Metropolis search , Text classification , mixture model , Bayesian feature selection
  • Journal title
    Information Processing and Management
  • Serial Year
    2012
  • Journal title
    Information Processing and Management
  • Record number

    1229216