DocumentCode :
2084539
Title :
An adaptive Markov model for text categorization
Author :
Li, Jin ; Yue, Kun ; Liu, WeiYi
Author_Institution :
Sch. of Software, Yunnan Univ., China
Volume :
1
fYear :
2008
fDate :
17-19 Nov. 2008
Firstpage :
802
Lastpage :
807
Abstract :
Existing methods for text categorization assume that a document is a bag of words. While computationally efficient, such a representation is unable to capture sequential information. In this paper, a document is looked upon as a sequence of characters or words and the preprocessing for text categorization, such as word segmentation and feature selection, is not demanded. Statistical dependencies among the neighboring terms of a sequence are captured by different order Markov models. We proposed a sequence classification methods based on adaptive Markov model. Our method blends the Markov models with different order values together for text categorization automatically and effectively. We present an extensive experimental evaluation of our method on an English collections and one Chinese corpus. The results show the high recall and precision of our method.
Keywords :
Markov processes; pattern classification; text analysis; adaptive Markov model; sequence classification methods; text categorization; Classification tree analysis; Context modeling; Frequency; Information science; Intelligent systems; Knowledge engineering; Probability; Search engines; Text categorization; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent System and Knowledge Engineering, 2008. ISKE 2008. 3rd International Conference on
Conference_Location :
Xiamen
Print_ISBN :
978-1-4244-2196-1
Electronic_ISBN :
978-1-4244-2197-8
Type :
conf
DOI :
10.1109/ISKE.2008.4731039
Filename :
4731039
Link To Document :
بازگشت