DocumentCode
456775
Title
A Novel Multilingual Text Categorization System using Latent Semantic Indexing
Author
Lee, Chung-Hong ; Yang, Hsin-Chang ; Ma, Sheng-Min
Author_Institution
Dept. of Electr. Eng., Nat. Kaohsiung Univ. of Appl. Sci.
Volume
2
fYear
2006
fDate
Aug. 30 2006-Sept. 1 2006
Firstpage
503
Lastpage
506
Abstract
Latent semantic indexing is a well known technique in information retrieval, especially in dealing with polysemy and synonymy. LSI use SVD process to decompose the original term-document matrix into a lower dimension triplet. The triplet (the resulted matrices) is the approximation to original matrix and can capture the latent semantic relation between terms. In this paper, we propose a novel method for multilingual text categorization using latent semantic indexing. The centroid of each class has been calculated in the decomposed SVD space. The similarity threshold of categorization is predefined for each centroid. Test documents with similarity measurement larger than the threshold is labeled "positive" (relevant) or else would be labeled "negative" (non-relevant). Experimental result indicated that the performance on the precision, recall and F1 are quite good using LSI technique to categorize the multi-language text. The F1 measurement has an average value of 70% and the precision can reach 80% using our algorithm
Keywords
indexing; information retrieval; semantic Web; singular value decomposition; text analysis; SVD process; information retrieval; latent semantic indexing; matrix decomposition; multilingual text categorization system; similarity threshold; singular value decomposition; Data mining; Indexing; Information analysis; Information management; Information retrieval; Internet; Large scale integration; Matrix decomposition; Testing; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovative Computing, Information and Control, 2006. ICICIC '06. First International Conference on
Conference_Location
Beijing
Print_ISBN
0-7695-2616-0
Type
conf
DOI
10.1109/ICICIC.2006.214
Filename
1692035
Link To Document