DocumentCode
417286
Title
Parsing speech into articulatory events
Author
Hacioglu, Kadri ; Pellom, Bryan ; Ward, Wayne
Author_Institution
Center for Spoken Language Res., Colorado Univ., Boulder, CO, USA
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
In this paper, the states in the speech production process are defined by a number of categorical articulatory features. We describe a detector that outputs a stream (sequence of classes) for each articulatory feature given the Mel frequency cepstral coefficient (MFCC) representation of the input speech. The detector consists of a bank of recurrent neural network (RNN) classifiers, a variable depth lattice generator and Viterbi decoder. A bank of classifiers has been previously used for articulatory feature detection by many researchers. We extend their work first by creating variable depth lattices for each feature and then by combining them into product lattices for rescoring using the Viterbi algorithm. During the rescoring we incorporate language and duration constraints along with the posterior probabilities of classes provided by the RNN classifiers. We present our results for the place and manner features using TIMIT data, and compare the results to a baseline system. We report performance improvements both at the frame and segment levels.
Keywords
Viterbi decoding; cepstral analysis; feature extraction; pattern classification; recurrent neural nets; signal representation; speech processing; speech recognition; MFCC representation; Mel frequency cepstral coefficient; RNN classifiers; TIMIT data; Viterbi decoder; Viterbi rescoring; categorical articulatory features; duration constraints; language constraints; performance improvements; posterior probabilities; product lattices; recurrent neural network; speech production; variable depth lattice generator; variable depth lattices; Acoustic signal detection; Computer vision; Detectors; Event detection; Feature extraction; Lattices; Mel frequency cepstral coefficient; Recurrent neural networks; Speech recognition; Viterbi algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326138
Filename
1326138
Link To Document