DocumentCode
2267249
Title
Word clustering with parallel spoken language corpora
Author
Wang, Ye-Yi ; Lafferty, John ; Waibel, Alex
Author_Institution
Carnegie Mellon Univ., Pittsburgh, PA, USA
Volume
4
fYear
1996
fDate
3-6 Oct 1996
Firstpage
2364
Abstract
We introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well suited to machine translation tasks
Keywords
language translation; natural languages; speech processing; statistical analysis; word processing; bilingual data; bilingual parallel corpus; machine translation tasks; monolingual data; mutual information clustering algorithms; parallel spoken language corpora; statistical translation model; word clustering algorithm; Books; Bridges; Clustering algorithms; Data mining; Entropy; Greedy algorithms; Merging; Mutual information; Natural languages; Scheduling;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
0-7803-3555-4
Type
conf
DOI
10.1109/ICSLP.1996.607283
Filename
607283
Link To Document