DocumentCode
699991
Title
Incorporating acoustic feature diversity into the linguistic search space for syllable based speech recognition
Author
Ramya, R. ; Hegde, Rajesh M. ; Murthy, Hema A.
Author_Institution
Indian Inst. of Technol. Madras, Chennai, India
fYear
2008
fDate
25-29 Aug. 2008
Firstpage
1
Lastpage
5
Abstract
Acoustic features derived from the short time magnitude and phase spectrum provide complementary information. In this paper, we discuss the significance of incorporating this diverse information into the linguistic search space for syllable based speech recognition. The diversity of group delay acoustic features computed from the phase spectrum, and MFCC computed from the magnitude spectrum, is first illustrated in a lower dimensional feature space. Motivated by this diversity of information in the acoustic feature space, we derive syllable-feature pairs. The selection of syllable-feature pairs is based on isolated syllable recognition results, computed apriori using the two acoustic feature streams. During the recognition process, based on the syllable-feature pair information likelihoods are appropriately weighted using a weighted likelihood scheme. The syllable lattice is now rescored using these weighted syllable-feature pairs in the linguistic search space. This technique of appropriately weighting the relevant acoustic feature for each syllable during the decoding process in the linguistic search space, yields reduced word error rate (WER), for experiments conducted on the TIMIT and the DBIL databases.
Keywords
acoustic signal processing; decoding; error statistics; linguistics; maximum likelihood estimation; speech recognition; DBIL database; MFCC; TIMIT database; WER reduction; decoding process; group delay acoustic features diversity; isolated syllable recognition; linguistic search space; lower dimensional acoustic feature space; syllable based speech recognition; syllable lattice; syllable-feature pair; weighted likelihood scheme; word error rate reduction; Databases; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing Conference, 2008 16th European
Conference_Location
Lausanne
ISSN
2219-5491
Type
conf
Filename
7080523
Link To Document