DocumentCode :
3448897
Title :
Research of English text classification methods based on semantic meaning
Author :
Lv, Lin ; Liu, Yu-Shu
Author_Institution :
Sch. of Comput. Sci. & Technol., Beijing Inst. of Technol.
fYear :
2005
fDate :
5-6 Dec. 2005
Firstpage :
689
Lastpage :
700
Abstract :
To overcome the limitations of traditional text classification approaches based on bag-of-words representation and to effectively incorporate linguistic knowledge and conceptual index into text vector space representation, based on WordNet thesaurus and latent semantic indexing (LSI) model, combinative method of them is presented to realize naive Bayes text classification and simple vector distance text classification, and five groups of contrastive experiments are made respectively. The results show that the accuracy rates of the two text classification methods are both gradually advanced along with more and more in-depth semantic analysis, which indicates that semantic mining is very important and necessary to text classification. The comparative analysis of the related work is also given
Keywords :
classification; indexing; natural languages; text analysis; thesauri; English text classification; WordNet thesaurus; latent semantic indexing; naive Bayes text classification; semantic meaning; text vector space representation; Data mining; Feature extraction; Indexing; Large scale integration; Space technology; Speech analysis; Tagging; Text categorization; Thesauri; Viterbi algorithm; LSI; Naïve Bayes; Semantic Meaning; Simple Vector Distance; WordNet;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communications Technology, 2005. Enabling Technologies for the New Knowledge Society: ITI 3rd International Conference on
Conference_Location :
Cairo
Print_ISBN :
0-7803-9270-1
Type :
conf
DOI :
10.1109/ITICT.2005.1609660
Filename :
1609660
Link To Document :
بازگشت