Research of English text classification methods based on semantic meaning

Author

Lv, Lin ; Liu, Yu-Shu

Author_Institution

Sch. of Comput. Sci. & Technol., Beijing Inst. of Technol.

fYear

2005

fDate

5-6 Dec. 2005

Firstpage

689

Lastpage

700

Abstract

To overcome the limitations of traditional text classification approaches based on bag-of-words representation and to effectively incorporate linguistic knowledge and conceptual index into text vector space representation, based on WordNet thesaurus and latent semantic indexing (LSI) model, combinative method of them is presented to realize naive Bayes text classification and simple vector distance text classification, and five groups of contrastive experiments are made respectively. The results show that the accuracy rates of the two text classification methods are both gradually advanced along with more and more in-depth semantic analysis, which indicates that semantic mining is very important and necessary to text classification. The comparative analysis of the related work is also given

Keywords

classification; indexing; natural languages; text analysis; thesauri; English text classification; WordNet thesaurus; latent semantic indexing; naive Bayes text classification; semantic meaning; text vector space representation; Data mining; Feature extraction; Indexing; Large scale integration; Space technology; Speech analysis; Tagging; Text categorization; Thesauri; Viterbi algorithm; LSI; Naïve Bayes; Semantic Meaning; Simple Vector Distance; WordNet;

fLanguage

English

Publisher

ieee

Conference_Titel

Information and Communications Technology, 2005. Enabling Technologies for the New Knowledge Society: ITI 3rd International Conference on

Conference_Location

Cairo

Print_ISBN

0-7803-9270-1

Type

conf

DOI

10.1109/ITICT.2005.1609660

Filename

1609660