DocumentCode
3317683
Title
Application of the Character-Level Statistical Method in Text Categorization
Author
Yang, Zhen ; Nie, Xiangfei ; Xu, Weiran ; Guo, Jun
Author_Institution
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
Volume
2
fYear
2006
fDate
3-6 Nov. 2006
Firstpage
1412
Lastpage
1417
Abstract
It is generally thought that semantic and grammatical information was very significant to better understanding and processing of text. But in simple text categorization task, absence of this information does not always lead to the degradation of classifier performance. In this paper, we discuss the application of the character-level statistical method in text categorization, which extract character-level frequent pattern rather than consider the semantic and grammatical information. Compared with traditional n-gram model, the presented method is easy and convenient. Then by casting character-level statistical method in Bayesian theory framework, the proposed method was applied to spam detection. At last, we discuss the multiclass problem in short message categorization based on combination strategies. Effectiveness of the models and feasibility of the present method are verified
Keywords
Bayes methods; natural language processing; pattern recognition; statistical analysis; text analysis; Bayesian theory; character-level frequent pattern extraction; character-level statistical method; grammatical information; semantic information; short message categorization; spam detection; text categorization; Bayesian methods; Casting; Data mining; Degradation; Feature extraction; Information processing; Natural languages; Statistical analysis; Text categorization; Text processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Security, 2006 International Conference on
Conference_Location
Guangzhou
Print_ISBN
1-4244-0605-6
Electronic_ISBN
1-4244-0605-6
Type
conf
DOI
10.1109/ICCIAS.2006.295293
Filename
4076199
Link To Document