DocumentCode
302104
Title
Domain word translation by space-frequency analysis of context length histograms
Author
Fung, Pascale
Author_Institution
Dept. of Comput. Sci., Columbia Univ., New York, NY, USA
Volume
1
fYear
1996
fDate
7-10 May 1996
Firstpage
184
Abstract
We report a new statistical feature relating a bilingual word pair in a non-parallel English-Chinese corpus. It is found that the lengths of context segments of a word are closely correlated to that of the translation, even when the corpus is non-parallel, i.e., monolingual texts which are not translations of each other. The context segment length histogram of a word has a characteristic pattern and corresponds to that of its translation. If a word appears most frequently in long segments, its translation is found to be most likely occurring in long segments. One way to match these histograms is to first extract their salient shape characteristics by space-frequency analysis and then match them against each other using dynamic time warping. The results of matching can be used in combination with other statistical features to bootstrap a word or term translation algorithm from non-parallel corpora
Keywords
language translation; natural languages; statistical analysis; bilingual word pair; bootstrap; context length histograms; context segments; domain word translation; dynamic time warping; monolingual texts; nonparallel English-Chinese corpus; shape characteristics; space-frequency analysis; statistical feature; Computer science; Councils; Dictionaries; Histograms; Humans; Natural languages; Pattern matching; Shape; Signal analysis; Statistical learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location
Atlanta, GA
ISSN
1520-6149
Print_ISBN
0-7803-3192-3
Type
conf
DOI
10.1109/ICASSP.1996.540321
Filename
540321
Link To Document