DocumentCode :
3162024
Title :
Latent perceptual mapping with data-driven variable-length acoustic units for template-based speech recognition
Author :
Sundaram, Shiva ; Bellegarda, Jerome R.
Author_Institution :
Deutsche Telekom Labs., Berlin, Germany
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4125
Lastpage :
4128
Abstract :
In recent work, we introduced Latent Perceptual Mapping (LPM) [1], a new framework for acoustic modeling suitable for template-like speech recognition. The basic idea is to leverage a reduced dimensionality description of the observations to derive acoustic prototypes that are closely aligned with perceived acoustic events. Our initial work adopted a bag-of-frames strategy to represent relevant acoustic information within speech segments. In this paper, we extend this approach by better integrating temporal information into the LPM feature extraction. Specifically, we use variable-length units to represent acoustic events at the supra-frame level, in order to benefit from finer temporal alignments when deriving the acoustic prototypes. The outcome can be viewed as a generalization of both conventional template-based approaches and recently proposed sparse representation solutions. This extension is experimentally validated on a context-independent phoneme classification task using the TIMIT corpus.
Keywords :
sparse matrices; speech recognition; LPM feature extraction; TIMIT corpus; acoustic modeling; bag-of-frames strategy; context-independent phoneme classification task; data-driven variable-length acoustic units; latent perceptual mapping; perceived acoustic events; reduced dimensionality description; sparse representation solutions; speech segments; supraframe level; template-based speech recognition; temporal alignments; temporal information integration; Acoustics; Feature extraction; Hidden Markov models; Speech; Speech recognition; Training; Vectors; acoustic modeling; data-driven speech units; dimensionality reduction; latent perceptual mapping; template-based speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288826
Filename :
6288826
Link To Document :
بازگشت