• DocumentCode
    417286
  • Title

    Parsing speech into articulatory events

  • Author

    Hacioglu, Kadri ; Pellom, Bryan ; Ward, Wayne

  • Author_Institution
    Center for Spoken Language Res., Colorado Univ., Boulder, CO, USA
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    In this paper, the states in the speech production process are defined by a number of categorical articulatory features. We describe a detector that outputs a stream (sequence of classes) for each articulatory feature given the Mel frequency cepstral coefficient (MFCC) representation of the input speech. The detector consists of a bank of recurrent neural network (RNN) classifiers, a variable depth lattice generator and Viterbi decoder. A bank of classifiers has been previously used for articulatory feature detection by many researchers. We extend their work first by creating variable depth lattices for each feature and then by combining them into product lattices for rescoring using the Viterbi algorithm. During the rescoring we incorporate language and duration constraints along with the posterior probabilities of classes provided by the RNN classifiers. We present our results for the place and manner features using TIMIT data, and compare the results to a baseline system. We report performance improvements both at the frame and segment levels.
  • Keywords
    Viterbi decoding; cepstral analysis; feature extraction; pattern classification; recurrent neural nets; signal representation; speech processing; speech recognition; MFCC representation; Mel frequency cepstral coefficient; RNN classifiers; TIMIT data; Viterbi decoder; Viterbi rescoring; categorical articulatory features; duration constraints; language constraints; performance improvements; posterior probabilities; product lattices; recurrent neural network; speech production; variable depth lattice generator; variable depth lattices; Acoustic signal detection; Computer vision; Detectors; Event detection; Feature extraction; Lattices; Mel frequency cepstral coefficient; Recurrent neural networks; Speech recognition; Viterbi algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1326138
  • Filename
    1326138