Word clustering with parallel spoken language corpora

Author

Wang, Ye-Yi ; Lafferty, John ; Waibel, Alex

Author_Institution

Carnegie Mellon Univ., Pittsburgh, PA, USA

Volume

fYear

1996

fDate

3-6 Oct 1996

Firstpage

2364

Abstract

We introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well suited to machine translation tasks

Keywords

language translation; natural languages; speech processing; statistical analysis; word processing; bilingual data; bilingual parallel corpus; machine translation tasks; monolingual data; mutual information clustering algorithms; parallel spoken language corpora; statistical translation model; word clustering algorithm; Books; Bridges; Clustering algorithms; Data mining; Entropy; Greedy algorithms; Merging; Mutual information; Natural languages; Scheduling;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607283

Filename

607283

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2267249