• DocumentCode
    2734430
  • Title

    Cross language information retrieval based on LDA

  • Author

    Wang, Ai ; Li, YaoDong ; Wang, Wei

  • Author_Institution
    Key Lab. of Complex Syst. & Intell. Sci., Chinese Acad. of Sci., Beijing, China
  • Volume
    3
  • fYear
    2009
  • fDate
    20-22 Nov. 2009
  • Firstpage
    485
  • Lastpage
    490
  • Abstract
    This paper proposed a LDA-based cross-language retrieval model that did not rely on word-by-word translation of query or document. Instead, a parallel corpus was used to estimate a cross-language LDA (Latent Dirichlet Allocation) model. We assumed that a topic variable Z in LDA could generate both an English token and a Chinese token, given that the parallel corpus contained two languages: English and Chinese. Therefore, the LDA model was easy to be extended to multi-language information retrieval as long as a multi-lingual parallel corpus was provided. The proposed LDA-based crosslanguage retrieval model was compared with three popular retrieval models: LDA-based mono-lingual document model; Mono-lingual TF.IDF retrieval model; Cross-lingual Latent Semantic Indexing retrieval model on CNKI datasets. Experimental results showed that this model was very effective and achieved very good performance.
  • Keywords
    document handling; indexing; information retrieval; natural language processing; probability; CNKI dataset; Chinese token; DDF retrieval model; English token; LDA based cross language retrieval model; LDA based monolingual document model; crosslingual latent semantic indexing retrieval model; latent dirichlet allocation; monolingual TF; multilanguage information retrieval; multilingual parallel corpus; Automation; Indexing; Information retrieval; Intelligent systems; Laboratories; Large scale integration; Linear discriminant analysis; Natural languages; Predictive models; Vectors; LDA; cross language information retrieval; topic model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-4754-1
  • Electronic_ISBN
    978-1-4244-4738-1
  • Type

    conf

  • DOI
    10.1109/ICICISYS.2009.5358121
  • Filename
    5358121