• DocumentCode
    3422314
  • Title

    Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation

  • Author

    Ketabdar, Hamed ; Bourlard, Hervé

  • Author_Institution
    IDIAP Res. Inst., Martigny
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    4065
  • Lastpage
    4068
  • Abstract
    Phone posteriors has recently quite often used (as additional features or as local scores) to improve state-of-the-art automatic speech recognition (ASR) systems. Usually, better phone posterior estimates yield better ASR performance. In the present paper we present some initial, yet promising, work towards hierarchically improving these phone posteriors, by implicitly integrating phonetic and lexical knowledge. In the approach investigated here, phone posteriors estimated with a multilayer perceptron (MLP) and short (9 frames) temporal context, are used as input to a second MLP, spanning a longer temporal context (e.g. 19 frames of posteriors) and trained to refine the phone posterior estimates. The rationale behind this is that at the output of every MLP, the information stream is getting simpler (converging to a sequence of binary posterior vectors), and can thus be further processed (using a simpler classifier) by looking at a larger temporal window. Longer term dependencies can be interpreted as phonetic, sub-lexical and lexical knowledge. The resulting enhanced posteriors can then be used for phone and word recognition, in the same way as regular phone posteriors, in hybrid HMM/ANN or Tandem systems. The proposed method has been tested on TIMIT, OGI Numbers and Conversational Telephone Speech (CTS) databases, always resulting in consistent and significant improvements in both phone and word recognition rates.
  • Keywords
    hidden Markov models; multilayer perceptrons; speech recognition; ANN; Conversational Telephone Speech; HMM; OGI Numbers; TIMIT; automatic speech recognition; binary posterior vectors; hierarchical integration; lexical knowledge; multilayer perceptron; phone posterior estimation; phonetic knowledge; word recognition; Artificial neural networks; Automatic speech recognition; Databases; Hidden Markov models; Multilayer perceptrons; Neural networks; Speech recognition; State estimation; Testing; Yield estimation; Enhanced phone posteriors; Neural Networks; Phone posterior estimation; Phonetic and lexical knowledge; Temporal posterior context;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518547
  • Filename
    4518547