• DocumentCode
    3648528
  • Title

    Cross-lingual document similarity

  • Author

    Andrej Muhič;Jan Rupnik;Primož Škraba

  • Author_Institution
    A.I. Laboratory, Jozef Stefan Institute, Jamova 39, 10000 Ljubljana, Slovenia
  • fYear
    2012
  • fDate
    6/1/2012 12:00:00 AM
  • Firstpage
    387
  • Lastpage
    392
  • Abstract
    In this paper we investigated how to compute similarities between documents written in different languages based on a weekly aligned multi-lingual collection of documents. Computing the cross-lingual similarities is based on an aligned set of basis vectors obtained by either latent semantic indexing or the k-means algorithm on an aligned multi-lingual corpus. We evaluated the methods on two data sets: Wikipedia and European Parliament Proceedings Parallel Corpus.
  • Keywords
    "Europe","Information services","Electronic publishing","Internet"
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Interfaces (ITI), Proceedings of the ITI 2012 34th International Conference on
  • ISSN
    1334-2762
  • Print_ISBN
    978-1-4673-1629-3
  • Type

    conf

  • DOI
    10.2498/iti.2012.0467
  • Filename
    6308038