Title :
Using visual information in automatic speech segmentation
Author :
Eren Akdemir;Tolga Ciloglu
Author_Institution :
Elektrik ve Elektronik M?hendisli?i B?l?m?, Orta Do?u Teknik ?niversitesi, Turkey
fDate :
4/1/2008 12:00:00 AM
Abstract :
In this study, the use of visual information in automatic speech segmentation is investigated. Automatic speech segmentation is an essential task in speech processing systems. It is needed in speech recognition systems for training in speech synthesis systems for obtaining appropriate data and etc. The motions of upper and lower lips are incorporated into a hidden Markov model based segmentation process. MOCHA-TIMIT database, which involves simultaneous articulatograph and microphone recordings, was used to develop and test the models. Different feature vector compositions are proposed for incorporation of visual parameters to the automatic segmentation system. Average error of the system with respect to manual segmentation is decreased by 10.1%. The results are examined in a boundary-class dependent manner, and the performance of the system in different boundary types is discussed. After analyzing the boundary-class dependent performance, the system performance is increased by 12.1% by using the feature vector in only selected boundaries.
Keywords :
"Hidden Markov models","Mel frequency cepstral coefficient","Speech","Motion segmentation","Speech processing","Visualization","Speech recognition"
Conference_Titel :
Signal Processing, Communication and Applications Conference, 2008. SIU 2008. IEEE 16th
Print_ISBN :
978-1-4244-1998-2
DOI :
10.1109/SIU.2008.4632641