Title :
Term relevance dependency model for text classification
Author :
Meng-Sung Wu ; Hsin-Min Wang
Author_Institution :
Inf. & Commun. Res. Labs, Ind. Technol. Res. Inst., Hsinchu, Taiwan
Abstract :
Text classification (TC) has long been an important research topic in information retrieval (IR) related areas. Conventional language model (LM)-based TC is solely based on matching the words in the documents and classes by using a naïve Bayes classifier (NBC). In the literature, both the term association model (TA), which further considers word-to-word information, and the relevance model (RM), which further considers word-to-document information, have been shown to outperform a simple LM for IR. In this paper, we study a novel integration of TA with RM for LM-NBC-based TC. The new model is called the term relevance dependency model. In the model, the probability of a word given a class is represented by a term association LM probability learned by a RM framework. The results of TC experiments on the 20newsgroups and Reuters-21578 corpora demonstrate that the new model outperforms the standard NBC and several other LM-NBC-based methods.
Keywords :
classification; information retrieval; probability; text analysis; word processing; LM-NBC-based TC; NBC; RM framework; Reuters-21578 corpora; TA model; information retrieval; language model-based TC; naive Bayes classifier; term association LM probability; term association model; term relevance dependency model; text classification; word matching; word-to-document information; word-to-word information; Adaptation models; Computational modeling; Data models; Information retrieval; Smoothing methods; Support vector machines; Vectors;
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
Print_ISBN :
978-1-4673-2216-4