• DocumentCode
    2267249
  • Title

    Word clustering with parallel spoken language corpora

  • Author

    Wang, Ye-Yi ; Lafferty, John ; Waibel, Alex

  • Author_Institution
    Carnegie Mellon Univ., Pittsburgh, PA, USA
  • Volume
    4
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    2364
  • Abstract
    We introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well suited to machine translation tasks
  • Keywords
    language translation; natural languages; speech processing; statistical analysis; word processing; bilingual data; bilingual parallel corpus; machine translation tasks; monolingual data; mutual information clustering algorithms; parallel spoken language corpora; statistical translation model; word clustering algorithm; Books; Bridges; Clustering algorithms; Data mining; Entropy; Greedy algorithms; Merging; Mutual information; Natural languages; Scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.607283
  • Filename
    607283