Incorporating acoustic feature diversity into the linguistic search space for syllable based speech recognition

Author

Ramya, R. ; Hegde, Rajesh M. ; Murthy, Hema A.

Author_Institution

Indian Inst. of Technol. Madras, Chennai, India

fYear

2008

fDate

25-29 Aug. 2008

Firstpage

1

Lastpage

5

Abstract

Acoustic features derived from the short time magnitude and phase spectrum provide complementary information. In this paper, we discuss the significance of incorporating this diverse information into the linguistic search space for syllable based speech recognition. The diversity of group delay acoustic features computed from the phase spectrum, and MFCC computed from the magnitude spectrum, is first illustrated in a lower dimensional feature space. Motivated by this diversity of information in the acoustic feature space, we derive syllable-feature pairs. The selection of syllable-feature pairs is based on isolated syllable recognition results, computed apriori using the two acoustic feature streams. During the recognition process, based on the syllable-feature pair information likelihoods are appropriately weighted using a weighted likelihood scheme. The syllable lattice is now rescored using these weighted syllable-feature pairs in the linguistic search space. This technique of appropriately weighting the relevant acoustic feature for each syllable during the decoding process in the linguistic search space, yields reduced word error rate (WER), for experiments conducted on the TIMIT and the DBIL databases.

Keywords

acoustic signal processing; decoding; error statistics; linguistics; maximum likelihood estimation; speech recognition; DBIL database; MFCC; TIMIT database; WER reduction; decoding process; group delay acoustic features diversity; isolated syllable recognition; linguistic search space; lower dimensional acoustic feature space; syllable based speech recognition; syllable lattice; syllable-feature pair; weighted likelihood scheme; word error rate reduction; Databases; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Conference, 2008 16th European

Conference_Location

Lausanne

ISSN

2219-5491

Type

conf

Filename

7080523