• DocumentCode
    594890
  • Title

    Term relevance dependency model for text classification

  • Author

    Meng-Sung Wu ; Hsin-Min Wang

  • Author_Institution
    Inf. & Commun. Res. Labs, Ind. Technol. Res. Inst., Hsinchu, Taiwan
  • fYear
    2012
  • fDate
    11-15 Nov. 2012
  • Firstpage
    1064
  • Lastpage
    1067
  • Abstract
    Text classification (TC) has long been an important research topic in information retrieval (IR) related areas. Conventional language model (LM)-based TC is solely based on matching the words in the documents and classes by using a naïve Bayes classifier (NBC). In the literature, both the term association model (TA), which further considers word-to-word information, and the relevance model (RM), which further considers word-to-document information, have been shown to outperform a simple LM for IR. In this paper, we study a novel integration of TA with RM for LM-NBC-based TC. The new model is called the term relevance dependency model. In the model, the probability of a word given a class is represented by a term association LM probability learned by a RM framework. The results of TC experiments on the 20newsgroups and Reuters-21578 corpora demonstrate that the new model outperforms the standard NBC and several other LM-NBC-based methods.
  • Keywords
    classification; information retrieval; probability; text analysis; word processing; LM-NBC-based TC; NBC; RM framework; Reuters-21578 corpora; TA model; information retrieval; language model-based TC; naive Bayes classifier; term association LM probability; term association model; term relevance dependency model; text classification; word matching; word-to-document information; word-to-word information; Adaptation models; Computational modeling; Data models; Information retrieval; Smoothing methods; Support vector machines; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2012 21st International Conference on
  • Conference_Location
    Tsukuba
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4673-2216-4
  • Type

    conf

  • Filename
    6460319