• DocumentCode
    1135685
  • Title

    Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings

  • Author

    Erzin, Engin

  • Author_Institution
    Coll. of Eng., Koc Univ., Istanbul, Turkey
  • Volume
    17
  • Issue
    7
  • fYear
    2009
  • Firstpage
    1316
  • Lastpage
    1324
  • Abstract
    We present a new framework for joint analysis of throat and acoustic microphone (TAM) recordings to improve throat microphone only speech recognition. The proposed analysis framework aims to learn joint sub-phone patterns of throat and acoustic microphone recordings through a parallel branch HMM structure. The joint sub-phone patterns define temporally correlated neighborhoods, in which a linear prediction filter estimates a spectrally rich acoustic feature vector from throat feature vectors. Multimodal speech recognition with throat and throat-driven acoustic features significantly improves throat-only speech recognition performance. Experimental evaluations on a parallel TAM database yield benchmark phoneme recognition rates for throat-only and multimodal TAM speech recognition systems as 46.81% and 60.69%, respectively. The proposed throat-driven multimodal speech recognition system improves phoneme recognition rate to 52.58%, a significant relative improvement with respect to the throat-only speech recognition benchmark system.
  • Keywords
    audio recording; filtering theory; hidden Markov models; microphones; speech recognition; HMM structure; feature vectors; joint analysis; parallel TAM database; phoneme recognition; throat microphone speech recognition; throat-acoustic microphone recordings; throat-driven acoustic features; Acoustic sensors; Bones; Microphones; Noise robustness; Speech analysis; Speech enhancement; Speech processing; Speech recognition; Vectors; Working environment noise; Joint processing of throat and acoustic microphone (TAM) recordings; robust speech recognition; throat microphone speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2016733
  • Filename
    5165115