• DocumentCode
    311015
  • Title

    Wide context acoustic modeling in read vs. spontaneous speech

  • Author

    Finke, Michael ; Rogina, Ivica

  • Author_Institution
    Interactive Syst. Lab., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • Volume
    3
  • fYear
    1997
  • fDate
    21-24 Apr 1997
  • Firstpage
    1743
  • Abstract
    Context-dependent acoustic models have been applied in speech recognition research for many years, and have been shown to increase the recognition accuracy significantly. The most common approach is to use triphones. Several speech recognition groups have started investigating the use of larger phonetic context windows when building acoustic models. We discuss some of the computational problems arising from wide context modeling (polyphonic modeling) and present methods to cope with these problems. A two stage decision tree based polyphonic clustering approach is described which implements a more flexible parameter tying scheme. The new clustering approach gave us significant improvement across all tasks-WSJ, SWB, and Spontaneous Scheduling Task-and across all languages involved (German, Spanish, English). We report recognition results based on the JANUS speech recognition toolkit on two tasks comparing acoustic context phenomena in English read versus spontaneous speech. We used our WSJ 60K recognizer and the JANUS SWB 10K polyphonic recognizer
  • Keywords
    acoustic signal processing; natural languages; speech processing; speech recognition; tree data structures; English; German; JANUS SWB 10K polyphonic recognizer; JANUS speech recognition toolkit; SWB; Spanish; Spontaneous Scheduling Task; WSJ 60 K recogniser; acoustic models; codebook; computational problems; context acoustic modeling; context dependent acoustic models; parameter tying scheme; phonetic context windows; polyphonic clustering; polyphonic modeling; read speech; recognition accuracy; recognition results; speech recognition research; spontaneous speech; triphones; two stage decision tree; Clustering methods; Context modeling; Decision trees; Dictionaries; Error analysis; Interactive systems; Laboratories; Natural languages; Speech recognition; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
  • Conference_Location
    Munich
  • ISSN
    1520-6149
  • Print_ISBN
    0-8186-7919-0
  • Type

    conf

  • DOI
    10.1109/ICASSP.1997.598861
  • Filename
    598861