Speech bandwidth classification for broadcast news domain using artificial neural network and Gaussian mixture models

Author

Marko Kos;Matej Grasic;Bojan Kotnik;Zdravko Kacic

Author_Institution

University of Maribor, Faculty of Electrical Engineering and Computer Science, Laboratory for Digital Signal Processing, Smetanova ul. 17, SI-2000, Slovenia

fYear

2008

Firstpage

283

Lastpage

286

Abstract

In this paper we present research work that was carried out on Slovenian BNSI Broadcast News database regarding the speech bandwidth classification. Speech recorded in studio environment has frequency bandwidth of 8 kHz, while speech recorded over telephone channel has the bandwidth of 3.1 kHz. Speech bandwidth classification enables us to use separate speech models for automatic speech recognition (ASR), which helps to improve the overall automatic speech recognition result. For the task of speech bandwidth classification we used two different model-based principles. One principle is based on artificial neural network and the second principle is based on Gaussian mixture models. Both principles have been tested and evaluated using same front-end features for simple result comparison.

Keywords

"Speech","Artificial neural networks","Bandwidth","Databases","Materials","Speech recognition","Acoustics"

Publisher

ieee

Conference_Titel

Systems, Signals and Image Processing, 2008. IWSSIP 2008. 15th International Conference on

ISSN

2157-8672

Print_ISBN

978-80-227-2856-0

Electronic_ISBN

2157-8702

Type

conf

DOI

10.1109/IWSSIP.2008.4604422

Filename

4604422