DocumentCode :
56320
Title :
Use of Micro-Modulation Features in Large Vocabulary Continuous Speech Recognition Tasks
Author :
Dimitriadis, Dimitrios ; Bocchieri, Enrico
Author_Institution :
IBM Res., Yorktown Heights, NY, USA
Volume :
23
Issue :
8
fYear :
2015
fDate :
Aug. 2015
Firstpage :
1348
Lastpage :
1357
Abstract :
Most of the state-of-the-art ASR systems take as input a single type of acoustic features, dominated by the traditional feature schemes, i.e., MFCCs or PLPs. However, these features cannot model rapid, intra-frame phenomena present in the actual speech signals. On the other hand, micro-modulation components, inspired by the AM-FM speech model, can capture these important characteristics of spoken speech, resulting in significant performance improvements, as previously shown in small-vocabulary ASR tasks. Yet, they have limited use in large vocabulary ASR applications, where feature post-processing schemes are usually employed. To enable the successful application of these frequency measures in real-life tasks, we investigate their combination with the traditional Cepstral features when employing linear, e.g., HDA, and nonlinear, i.e., bottleneck neural net (BN), feature transforms. This feature combination is investigated in the context of the hybrid DNN-HMM framework, as well. The experimental results reveal that the integration of micro-modulation and Cepstral features, using neural nets, can greatly improve the ASR performance with respect to using the Cepstral features alone. We apply this novel feature extraction approach on different tasks, i.e., a clean speech task (DARPA-WSJ), the Aurora-4 task and a real-life, open-vocabulary, mobile search task, the Speak4it, always reporting improved performance, while the obtained relative word error reduction ranges between 7%-21% depending on the task, e.g., a relative WER improvement of 18% for the Speak4it task, and similar improvements, up to 21%, for the WSJ task are reported.
Keywords :
cepstral analysis; feature extraction; hidden Markov models; neural nets; speech recognition; AM-FM speech model; acoustic features; actual speech signals; cepstral features; feature extraction approach; hybrid DNN-HMM framework; intra-frame phenomena; large vocabulary continuous speech recognition tasks; micromodulation features; neural nets; small-vocabulary ASR tasks; spoken speech; Cepstral analysis; Feature extraction; Neural networks; Speech; Speech processing; Transforms; Feature extraction; neural networks; robustness; speech processing; speech recognition;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2015.2430815
Filename :
7103311
Link To Document :
بازگشت