Title of article

Does dictionary based bilingual retrieval work in a non-normalized index?

Author/Authors

Eija Airio، نويسنده , , Kimmo Kettunen، نويسنده ,

Issue Information

دوماهنامه با شماره پیاپی سال 2009

Pages

11

From page

703

To page

713

Abstract

Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English–Finnish, English–Swedish, Swedish–Finnish and Finnish–Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish–Swedish, where s-gramming outperformed FCG.

Keywords

Word form generation , S-gramming , Bilingual retrieval , Non-normalized index

Journal title

Information Processing and Management

Serial Year

2009

Journal title

Information Processing and Management

Record number

Does dictionary based bilingual retrieval work in a non-normalized index?

Eija Airio، نويسنده , , Kimmo Kettunen، نويسنده ,

1228994