DocumentCode :
3629096
Title :
Using visual information in automatic speech segmentation
Author :
Eren Akdemir;Tolga Ciloglu
Author_Institution :
Elektrik ve Elektronik M?hendisli?i B?l?m?, Orta Do?u Teknik ?niversitesi, Turkey
fYear :
2008
fDate :
4/1/2008 12:00:00 AM
Firstpage :
1
Lastpage :
4
Abstract :
In this study, the use of visual information in automatic speech segmentation is investigated. Automatic speech segmentation is an essential task in speech processing systems. It is needed in speech recognition systems for training in speech synthesis systems for obtaining appropriate data and etc. The motions of upper and lower lips are incorporated into a hidden Markov model based segmentation process. MOCHA-TIMIT database, which involves simultaneous articulatograph and microphone recordings, was used to develop and test the models. Different feature vector compositions are proposed for incorporation of visual parameters to the automatic segmentation system. Average error of the system with respect to manual segmentation is decreased by 10.1%. The results are examined in a boundary-class dependent manner, and the performance of the system in different boundary types is discussed. After analyzing the boundary-class dependent performance, the system performance is increased by 12.1% by using the feature vector in only selected boundaries.
Keywords :
"Hidden Markov models","Mel frequency cepstral coefficient","Speech","Motion segmentation","Speech processing","Visualization","Speech recognition"
Publisher :
ieee
Conference_Titel :
Signal Processing, Communication and Applications Conference, 2008. SIU 2008. IEEE 16th
ISSN :
2165-0608
Print_ISBN :
978-1-4244-1998-2
Type :
conf
DOI :
10.1109/SIU.2008.4632641
Filename :
4632641
Link To Document :
بازگشت