• DocumentCode
    2263265
  • Title

    A probabilistic framework for feature-based speech recognition

  • Author

    Glass, James ; Chang, Joana ; McCandless, Michael

  • Author_Institution
    Lab. for Comput. Sci., MIT, Cambridge, MA, USA
  • Volume
    4
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    2277
  • Abstract
    Most current speech recognizers use an observation space which is based on a temporal sequence of “frames” (e.g. Mel-cepstra). There is another class of recognizer which further processes these frames to produce a segment-based network, and represents each segment by fixed-dimensional “features”. In such feature-based recognizers, the observation space takes the form of a temporal network of feature vectors, so that a single segmentation of an utterance uses a subset of all possible feature vectors. In this paper, we examine a maximum a-posteriori decoding strategy for feature-based recognizers and develop a normalization criterion that is useful for a segment-based Viterbi or A* search. We report experimental results for the task of phonetic recognition on the TIMIT corpus, where we achieved context-independent and context-dependent (using diphones) results on the core test set of 64.1% and 69.5% respectively
  • Keywords
    Viterbi decoding; decoding; estimation theory; maximum likelihood estimation; probability; search problems; speech coding; speech recognition; vectors; Mel-cepstra; TIMIT corpus; context-dependent recognition; context-independent recognition; diphones; feature vectors; feature-based speech recognition; fixed-dimensional features; frame temporal sequence; maximum a-posteriori decoding strategy; normalization criterion; observation space; phonetic recognition; probabilistic framework; segment-based A* search; segment-based Viterbi search; segment-based network; temporal network; utterance segmentation; Acoustic testing; Computer science; Decoding; Feature extraction; Glass; Laboratories; Natural languages; Space technology; Speech recognition; Viterbi algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.607261
  • Filename
    607261