Title of article
Automatic Generation of Japanese–English Bilingual Thesauri Based on Bilingual Corpora
Author/Authors
Keita Tsuji and Kyo Kageura، نويسنده ,
Issue Information
ماهنامه با شماره پیاپی سال 2006
Pages
16
From page
891
To page
906
Abstract
The authors propose a method for automatically generating
Japanese–English bilingual thesauri based on bilingual
corpora. The term bilingual thesaurus refers to a set
of bilingual equivalent words and their synonyms. Most
of the methods proposed so far for extracting bilingual
equivalent word clusters from bilingual corpora depend
heavily on word frequency and are not effective for dealing
with low-frequency clusters. These low-frequency
bilingual clusters are worth extracting because they contain
many newly coined terms that are in demand but are
not listed in existing bilingual thesauri. Assuming that
single language-pair-independent methods such as
frequency-based ones have reached their limitations and
that a language-pair-dependent method used in combination
with other methods shows promise, the authors
propose the following approach: (a) Extract translation
pairs based on transliteration patterns; (b) remove the
pairs from among the candidate words; (c) extract translation
pairs based on word frequency from the remaining
candidate words; and (d) generate bilingual clusters
based on the extracted pairs using a graph-theoretic
method. The proposed method has been found to be
significantly more effective than other methods.
Journal title
Journal of the American Society for Information Science and Technology
Serial Year
2006
Journal title
Journal of the American Society for Information Science and Technology
Record number
844123
Link To Document