Definition of VAD reference using different HHM topologies and frame dropping strategy

Author

Damjan Vlaj;Marko Kos;Zdravko Kačič

Author_Institution

Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia

fYear

2011

fDate

6/1/2011 12:00:00 AM

Firstpage

1

Lastpage

4

Abstract

In this paper the segmentation of the Aurora 2 database with three different types of models is presented. The segmentation is based on speech recognition results obtained by tests on the Aurora 2 database. Three types of tests are performed. In the first test the speech units are words (16 state HMMs) and in the second test the speech units are monophones (3 state HMMs). In these two tests the silence unit is made of 3 state hidden Markov model. In the third test the speech and silence units are made of only one state. One state presents the time duration of 10 ms. The estimation of the best procedure for creation of VAD reference is obtained by speech recognition accuracy, correctly recognized words and number of inserted words based on frame dropping strategy. The best speech recognition accuracy is achieved by the use of monophone speech units. This is due to the smallest number of inserted words.

Keywords

"Speech","Hidden Markov models","Training","Noise","Databases","Automatic speech recognition"

Publisher

ieee

Conference_Titel

Systems, Signals and Image Processing (IWSSIP), 2011 18th International Conference on

ISSN

2157-8672

Print_ISBN

978-1-4577-0074-3

Electronic_ISBN

2157-8702

Type

conf

Filename

5977387