DocumentCode :
417286
Title :
Parsing speech into articulatory events
Author :
Hacioglu, Kadri ; Pellom, Bryan ; Ward, Wayne
Author_Institution :
Center for Spoken Language Res., Colorado Univ., Boulder, CO, USA
Volume :
1
fYear :
2004
fDate :
17-21 May 2004
Abstract :
In this paper, the states in the speech production process are defined by a number of categorical articulatory features. We describe a detector that outputs a stream (sequence of classes) for each articulatory feature given the Mel frequency cepstral coefficient (MFCC) representation of the input speech. The detector consists of a bank of recurrent neural network (RNN) classifiers, a variable depth lattice generator and Viterbi decoder. A bank of classifiers has been previously used for articulatory feature detection by many researchers. We extend their work first by creating variable depth lattices for each feature and then by combining them into product lattices for rescoring using the Viterbi algorithm. During the rescoring we incorporate language and duration constraints along with the posterior probabilities of classes provided by the RNN classifiers. We present our results for the place and manner features using TIMIT data, and compare the results to a baseline system. We report performance improvements both at the frame and segment levels.
Keywords :
Viterbi decoding; cepstral analysis; feature extraction; pattern classification; recurrent neural nets; signal representation; speech processing; speech recognition; MFCC representation; Mel frequency cepstral coefficient; RNN classifiers; TIMIT data; Viterbi decoder; Viterbi rescoring; categorical articulatory features; duration constraints; language constraints; performance improvements; posterior probabilities; product lattices; recurrent neural network; speech production; variable depth lattice generator; variable depth lattices; Acoustic signal detection; Computer vision; Detectors; Event detection; Feature extraction; Lattices; Mel frequency cepstral coefficient; Recurrent neural networks; Speech recognition; Viterbi algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1326138
Filename :
1326138
Link To Document :
بازگشت