• DocumentCode
    2703152
  • Title

    Pronunciation Modeling for Spontaneous Speech Recognition using Latent Pronunciation Analysis (LPA) and Prior Knowledge

  • Author

    Che-Kuang Lin ; Lin-Shan Lee

  • Author_Institution
    Nat. Taiwan Univ., Taipei, Taiwan
  • Volume
    4
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Abstract
    In this paper, we propose a new framework for pronunciation modeling, in which the search algorithm tries to focus primarily on the clearly-pronounced portion of speech, while deemphasizing the observations of the slurred portion. This is based on the prior analysis that the pronunciation variation has to do with the predictability and the importance of the words in the spoken utterances, which may be estimated to some extent. We define a set of pronunciation-related features and develop a latent pronunciation analysis (LPA) to estimate the "latent pronunciation states" in the speech. The LPA probabilities, pronunciation-related features and another set of prior knowledge obtained from two distance measures between phonemes are integrated in a SVM classifier to produce a "pronunciation variation indicator" for each frame, based on which the Viterbi decoding was performed. Very encouraging initial results on Mandarin spontaneous speech were obtained in preliminary experiments.
  • Keywords
    Viterbi decoding; feature extraction; probability; search problems; speech coding; speech recognition; support vector machines; Mandarin spontaneous speech; SVM; Viterbi decoding; latent pronunciation analysis; prior knowledge; pronunciation modeling; pronunciation variation indicator; pronunciation-related features; search algorithm; spoken utterances; Algorithm design and analysis; Decoding; Hidden Markov models; Speech analysis; Speech processing; Speech recognition; State estimation; Support vector machine classification; Support vector machines; Viterbi algorithm; Distance metrics; Probabilistic Latent Semantic Analysis; Pronunciation variation; speech recognition; spontaneous speech;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0727-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2007.367002
  • Filename
    4218190