Time-varying LP cepstral features for improved isolated word speech recognition

Author

Ang, Federico ; Tsutsui, Hiroshi ; Miyanaga, Yoshikazu

Author_Institution

ICN Laboratory, Hokkaido University, Sapporo 060-0814, Japan

fYear

2015

fDate

21-24 July 2015

Firstpage

302

Lastpage

306

Abstract

Isolated word speech recognition for small vocabulary tasks has found great success with Mel-frequency cepstral coefficients as the speech feature of choice. Voice-controlled embedded systems, using word models as the basic units of speech, have found their way in a variety of commercial products. While the recognition rates for these products can be considered commercially acceptable under clean environments, channel noise and other external factors can still degrade recognition performance in practice. We propose the use of cepstral features derived from time-varying linear predictive coding, where the autoregressive model of the speech signal is represented by coefficients that are linear combinations of some simple basis functions. Variations in the usage of the features are investigated, such as skipping adjacent features, averaging and hybrid features with the goal of improving the performance of a 142 vocabulary, isolated words Japanese speech recognition task.

Keywords

Hidden Markov models; Mel frequency cepstral coefficient; Noise; Speech; Speech recognition; isolated word speech recognition; time-varying AR model;

fLanguage

English

Publisher

ieee

Conference_Titel

Digital Signal Processing (DSP), 2015 IEEE International Conference on

Conference_Location

Singapore, Singapore

Type

conf

DOI

10.1109/ICDSP.2015.7251880

Filename

7251880