DocumentCode :
2734430
Title :
Cross language information retrieval based on LDA
Author :
Wang, Ai ; Li, YaoDong ; Wang, Wei
Author_Institution :
Key Lab. of Complex Syst. & Intell. Sci., Chinese Acad. of Sci., Beijing, China
Volume :
3
fYear :
2009
fDate :
20-22 Nov. 2009
Firstpage :
485
Lastpage :
490
Abstract :
This paper proposed a LDA-based cross-language retrieval model that did not rely on word-by-word translation of query or document. Instead, a parallel corpus was used to estimate a cross-language LDA (Latent Dirichlet Allocation) model. We assumed that a topic variable Z in LDA could generate both an English token and a Chinese token, given that the parallel corpus contained two languages: English and Chinese. Therefore, the LDA model was easy to be extended to multi-language information retrieval as long as a multi-lingual parallel corpus was provided. The proposed LDA-based crosslanguage retrieval model was compared with three popular retrieval models: LDA-based mono-lingual document model; Mono-lingual TF.IDF retrieval model; Cross-lingual Latent Semantic Indexing retrieval model on CNKI datasets. Experimental results showed that this model was very effective and achieved very good performance.
Keywords :
document handling; indexing; information retrieval; natural language processing; probability; CNKI dataset; Chinese token; DDF retrieval model; English token; LDA based cross language retrieval model; LDA based monolingual document model; crosslingual latent semantic indexing retrieval model; latent dirichlet allocation; monolingual TF; multilanguage information retrieval; multilingual parallel corpus; Automation; Indexing; Information retrieval; Intelligent systems; Laboratories; Large scale integration; Linear discriminant analysis; Natural languages; Predictive models; Vectors; LDA; cross language information retrieval; topic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-4754-1
Electronic_ISBN :
978-1-4244-4738-1
Type :
conf
DOI :
10.1109/ICICISYS.2009.5358121
Filename :
5358121
Link To Document :
بازگشت