• DocumentCode
    3484954
  • Title

    Robust speech recognition using articulatory gestures in a Dynamic Bayesian Network framework

  • Author

    Mitra, Vikramjit ; Nam, Hosung ; Espy-Wilson, Carol Y.

  • Author_Institution
    Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
  • fYear
    2011
  • fDate
    11-15 Dec. 2011
  • Firstpage
    131
  • Lastpage
    136
  • Abstract
    Articulatory Phonology models speech as spatio-temporal constellation of constricting events (e.g. raising tongue tip, narrowing lips etc.), known as articulatory gestures. These gestures are associated with distinct organs (lips, tongue tip, tongue body, velum and glottis) along the vocal tract. In this paper we present a Dynamic Bayesian Network based speech recognition architecture that models the articulatory gestures as hidden variables and uses them for speech recognition. Using the proposed architecture we performed: (a) word recognition experiments on the noisy data of Aurora-2 and (b) phone recognition experiments on the University of Wisconsin X-ray microbeam database. Our results indicate that the use of gestural information helps to improve the performance of the recognition system compared to the system using acoustic information only.
  • Keywords
    belief networks; speech; speech recognition; acoustic information; articulatory gestures; articulatory phonology; dynamic Bayesian network framework; robust speech recognition; Acoustics; Hidden Markov models; Speech; Speech recognition; TV; Tongue; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
  • Conference_Location
    Waikoloa, HI
  • Print_ISBN
    978-1-4673-0365-1
  • Electronic_ISBN
    978-1-4673-0366-8
  • Type

    conf

  • DOI
    10.1109/ASRU.2011.6163918
  • Filename
    6163918