Title :
Research of English text classification methods based on semantic meaning
Author :
Lv, Lin ; Liu, Yu-Shu
Author_Institution :
Sch. of Comput. Sci. & Technol., Beijing Inst. of Technol.
Abstract :
To overcome the limitations of traditional text classification approaches based on bag-of-words representation and to effectively incorporate linguistic knowledge and conceptual index into text vector space representation, based on WordNet thesaurus and latent semantic indexing (LSI) model, combinative method of them is presented to realize naive Bayes text classification and simple vector distance text classification, and five groups of contrastive experiments are made respectively. The results show that the accuracy rates of the two text classification methods are both gradually advanced along with more and more in-depth semantic analysis, which indicates that semantic mining is very important and necessary to text classification. The comparative analysis of the related work is also given
Keywords :
classification; indexing; natural languages; text analysis; thesauri; English text classification; WordNet thesaurus; latent semantic indexing; naive Bayes text classification; semantic meaning; text vector space representation; Data mining; Feature extraction; Indexing; Large scale integration; Space technology; Speech analysis; Tagging; Text categorization; Thesauri; Viterbi algorithm; LSI; Naïve Bayes; Semantic Meaning; Simple Vector Distance; WordNet;
Conference_Titel :
Information and Communications Technology, 2005. Enabling Technologies for the New Knowledge Society: ITI 3rd International Conference on
Conference_Location :
Cairo
Print_ISBN :
0-7803-9270-1
DOI :
10.1109/ITICT.2005.1609660