Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation

Author

Ketabdar, Hamed ; Bourlard, Hervé

Author_Institution

IDIAP Res. Inst., Martigny

fYear

2008

fDate

March 31 2008-April 4 2008

Firstpage

4065

Lastpage

4068

Abstract

Phone posteriors has recently quite often used (as additional features or as local scores) to improve state-of-the-art automatic speech recognition (ASR) systems. Usually, better phone posterior estimates yield better ASR performance. In the present paper we present some initial, yet promising, work towards hierarchically improving these phone posteriors, by implicitly integrating phonetic and lexical knowledge. In the approach investigated here, phone posteriors estimated with a multilayer perceptron (MLP) and short (9 frames) temporal context, are used as input to a second MLP, spanning a longer temporal context (e.g. 19 frames of posteriors) and trained to refine the phone posterior estimates. The rationale behind this is that at the output of every MLP, the information stream is getting simpler (converging to a sequence of binary posterior vectors), and can thus be further processed (using a simpler classifier) by looking at a larger temporal window. Longer term dependencies can be interpreted as phonetic, sub-lexical and lexical knowledge. The resulting enhanced posteriors can then be used for phone and word recognition, in the same way as regular phone posteriors, in hybrid HMM/ANN or Tandem systems. The proposed method has been tested on TIMIT, OGI Numbers and Conversational Telephone Speech (CTS) databases, always resulting in consistent and significant improvements in both phone and word recognition rates.

Keywords

hidden Markov models; multilayer perceptrons; speech recognition; ANN; Conversational Telephone Speech; HMM; OGI Numbers; TIMIT; automatic speech recognition; binary posterior vectors; hierarchical integration; lexical knowledge; multilayer perceptron; phone posterior estimation; phonetic knowledge; word recognition; Artificial neural networks; Automatic speech recognition; Databases; Hidden Markov models; Multilayer perceptrons; Neural networks; Speech recognition; State estimation; Testing; Yield estimation; Enhanced phone posteriors; Neural Networks; Phone posterior estimation; Phonetic and lexical knowledge; Temporal posterior context;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Conference_Location

Las Vegas, NV

ISSN

1520-6149

Print_ISBN

978-1-4244-1483-3

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2008.4518547

Filename

4518547