Cross-lingual document similarity

Author

Andrej Muhič;Jan Rupnik;Primož Škraba

Author_Institution

A.I. Laboratory, Jozef Stefan Institute, Jamova 39, 10000 Ljubljana, Slovenia

fYear

2012

fDate

6/1/2012 12:00:00 AM

Firstpage

387

Lastpage

392

Abstract

In this paper we investigated how to compute similarities between documents written in different languages based on a weekly aligned multi-lingual collection of documents. Computing the cross-lingual similarities is based on an aligned set of basis vectors obtained by either latent semantic indexing or the k-means algorithm on an aligned multi-lingual corpus. We evaluated the methods on two data sets: Wikipedia and European Parliament Proceedings Parallel Corpus.

Keywords

"Europe","Information services","Electronic publishing","Internet"

Publisher

ieee

Conference_Titel

Information Technology Interfaces (ITI), Proceedings of the ITI 2012 34th International Conference on

ISSN

1334-2762

Print_ISBN

978-1-4673-1629-3

Type

conf

DOI

10.2498/iti.2012.0467

Filename

6308038

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3648528