• DocumentCode
    699991
  • Title

    Incorporating acoustic feature diversity into the linguistic search space for syllable based speech recognition

  • Author

    Ramya, R. ; Hegde, Rajesh M. ; Murthy, Hema A.

  • Author_Institution
    Indian Inst. of Technol. Madras, Chennai, India
  • fYear
    2008
  • fDate
    25-29 Aug. 2008
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Acoustic features derived from the short time magnitude and phase spectrum provide complementary information. In this paper, we discuss the significance of incorporating this diverse information into the linguistic search space for syllable based speech recognition. The diversity of group delay acoustic features computed from the phase spectrum, and MFCC computed from the magnitude spectrum, is first illustrated in a lower dimensional feature space. Motivated by this diversity of information in the acoustic feature space, we derive syllable-feature pairs. The selection of syllable-feature pairs is based on isolated syllable recognition results, computed apriori using the two acoustic feature streams. During the recognition process, based on the syllable-feature pair information likelihoods are appropriately weighted using a weighted likelihood scheme. The syllable lattice is now rescored using these weighted syllable-feature pairs in the linguistic search space. This technique of appropriately weighting the relevant acoustic feature for each syllable during the decoding process in the linguistic search space, yields reduced word error rate (WER), for experiments conducted on the TIMIT and the DBIL databases.
  • Keywords
    acoustic signal processing; decoding; error statistics; linguistics; maximum likelihood estimation; speech recognition; DBIL database; MFCC; TIMIT database; WER reduction; decoding process; group delay acoustic features diversity; isolated syllable recognition; linguistic search space; lower dimensional acoustic feature space; syllable based speech recognition; syllable lattice; syllable-feature pair; weighted likelihood scheme; word error rate reduction; Databases; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference, 2008 16th European
  • Conference_Location
    Lausanne
  • ISSN
    2219-5491
  • Type

    conf

  • Filename
    7080523