• DocumentCode
    460753
  • Title

    Feature Selection for the Topic-Based Mixture Model in Factored Classification

  • Author

    Chen, Qiong

  • Author_Institution
    Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
  • Volume
    1
  • fYear
    2006
  • fDate
    Nov. 2006
  • Firstpage
    39
  • Lastpage
    44
  • Abstract
    Topic-based mixture model (TBMM) is a learning algorithm for factored classification. In factored classification, the class label is factored into a vector of class features. For example, the class label for a personal Web page at a university might be described by two features: the academic discipline of the person, and their position (e.g., ´chemistry professor´ or ´physics student´). An approach to factored classification of text documents in which each document is assumed to be generated by a mixture of class features was proposed. Experiments in factored text classification problems show TBMM can outperform other two approaches for categories with especially sparse training data. In this paper, we analyze the feature selection for TBMM. For TBMM the feature space can be reduced to small number of feature terms with a significant improvement to classification accuracy. We present empirical results that indicate that TBMM is an adequate method to determine the feature terms for the supervised classification task
  • Keywords
    feature extraction; learning (artificial intelligence); pattern classification; text analysis; class features; class label; factored text classification; feature selection; feature space; learning algorithm; text documents; topic-based mixture model; Chemistry; Classification algorithms; Computer science; Gain measurement; Performance evaluation; Performance gain; Physics; Space technology; Text categorization; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security, 2006 International Conference on
  • Conference_Location
    Guangzhou
  • Print_ISBN
    1-4244-0605-6
  • Electronic_ISBN
    1-4244-0605-6
  • Type

    conf

  • DOI
    10.1109/ICCIAS.2006.294087
  • Filename
    4072040