Title :
Syllabification of conversational speech using Bidirectional Long-Short-Term Memory Neural Networks
Author :
Landsiedel, Christian ; Edlund, Jens ; Eyben, Florian ; Neiberg, Daniel ; Schuller, Björn
Author_Institution :
Dept. for Speech, Music & Hearing, R. Inst. of Technol., Stockholm, Sweden
Abstract :
Segmentation of speech signals is a crucial task in many types of speech analysis. We present a novel approach at segmentation on a syllable level, using a Bidirectional Long-Short-Term Memory Neural Network. It performs estimation of syllable nucleus positions based on regression of perceptually motivated input features to a smooth target function. Peak selection is performed to attain valid nuclei positions. Performance of the model is evaluated on the levels of both syllables and the vowel segments making up the syllable nuclei. The general applicability of the approach is illustrated by good results for two common databases-Switchboard and TIMIT-for both read and spontaneous speech, and a favourable comparison with other published results.
Keywords :
recurrent neural nets; speech synthesis; TIMIT; bidirectional long-short-term memory neural networks; smooth target function; speech analysis; speech signal segmentation; spontaneous speech; syllabification; syllable nuclei; syllable nucleus positions; Artificial neural networks; Correlation; Rhythm; Speech; Speech recognition; Switches; Training; Dialogue Systems; Recurrent Neural Networks; Speech Analysis; Syllabification;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5947543