DocumentCode
3642962
Title
Definition of VAD reference using different HHM topologies and frame dropping strategy
Author
Damjan Vlaj;Marko Kos;Zdravko Kačič
Author_Institution
Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia
fYear
2011
fDate
6/1/2011 12:00:00 AM
Firstpage
1
Lastpage
4
Abstract
In this paper the segmentation of the Aurora 2 database with three different types of models is presented. The segmentation is based on speech recognition results obtained by tests on the Aurora 2 database. Three types of tests are performed. In the first test the speech units are words (16 state HMMs) and in the second test the speech units are monophones (3 state HMMs). In these two tests the silence unit is made of 3 state hidden Markov model. In the third test the speech and silence units are made of only one state. One state presents the time duration of 10 ms. The estimation of the best procedure for creation of VAD reference is obtained by speech recognition accuracy, correctly recognized words and number of inserted words based on frame dropping strategy. The best speech recognition accuracy is achieved by the use of monophone speech units. This is due to the smallest number of inserted words.
Keywords
"Speech","Hidden Markov models","Training","Noise","Databases","Automatic speech recognition"
Publisher
ieee
Conference_Titel
Systems, Signals and Image Processing (IWSSIP), 2011 18th International Conference on
ISSN
2157-8672
Print_ISBN
978-1-4577-0074-3
Electronic_ISBN
2157-8702
Type
conf
Filename
5977387
Link To Document