مرکز منطقه ای اطلاع رساني علوم و فناوري - KL divergence based feature switching in the linguistic search space for automatic speech recognition

DocumentCode :

1835576

Title :

KL divergence based feature switching in the linguistic search space for automatic speech recognition

Author :

Kumar, J.C. ; Janakiraman, Rajesh ; Murthy, Hema A.

Author_Institution :

Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Madras, Chennai, India

fYear :

2010

fDate :

29-31 Jan. 2010

Firstpage :

Lastpage :

Abstract :

In this paper, we propose a novel idea for using two different feature streams in a continuous speech recognition system. Conventionally multiple feature streams are concatenated and HMMs trained to build triphone/syllable models. In this paper, instead of concatenation, we build separate subword HMMs for each of the feature streams during training. Also during training, the relevance of a feature stream to a particular sound is evaluated. During testing, hypotheses are generated by the language model. A greedy Kullback Leibler distance measure is used to determine the best feature at a particular instant, for the given hypotheses. There are two important aspects of this approach, namely, a) use of a feature that is relevant for recognizing a specific sound and b) the dimension of the feature stream does not increase with the number of different feature streams. To enable feature switching during recognition, a syllable-based automatically annotated recognition framework is used. In this framework, the test speech signal is first segmented into syllables, and, syllable boundaries are incorporated in the language model. Experiments are performed on three databases (a) Tamil DDNews database (b) TIMIT database (c) NTIMIT database, using, two features: MFCC (derived from the power spectrum of the speech signal) and MODGDF (derived from the phase spectrum of the speech signal). The results show that word error rate (WER) is lower than that of the use of joint features by almost 1.5% for the TIMIT database, by almost 3.4% for the NTIMIT database, by about 3.8% for the Tamil DDNew database.

Keywords :

natural language processing; speech recognition; KL divergence; MFCC; MODGDF; NTIMIT database; Tamil DDNews database; automatic speech recognition; continuous speech recognition system; feature switching; greedy Kullback Leibler distance measure; language model; linguistic search space; multiple feature streams; phase spectrum; power spectrum; subword HMM; syllable based automatically annotated recognition framework; test speech signal; triphone/syllable models; Automatic speech recognition; Concatenated codes; Error analysis; Hidden Markov models; Mel frequency cepstral coefficient; Natural languages; Particle measurements; Spatial databases; Speech recognition; Testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communications (NCC), 2010 National Conference on

Conference_Location :

Chennai

Print_ISBN :

978-1-4244-6383-1

Type :

conf

DOI :

10.1109/NCC.2010.5430186

Filename :

5430186

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1835576