Speech detection on broadcast audio

Author

Zubari, Unal ; Ozan, Ezgi Can ; Acar, Banu Oskay ; Ciloglu, Tolga ; Esen, Ersin ; Ates, Tugrul K. ; Onur, Duygu Oskay

Author_Institution

Video & Audio Process. Group, TUBITAK UZAY, Ankara, Turkey

fYear

2010

fDate

23-27 Aug. 2010

Firstpage

85

Lastpage

89

Abstract

Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Nonspeech via Gaussian Mixture Model (GMM) based classification. GMM´s are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC´s).

Keywords

Gaussian processes; audio signal processing; broadcasting; cepstral analysis; decision theory; mixture models; speaker recognition; GMM based classification; Gaussian mixture model; MFCC; SFD; activity-nonactivity decision; audio data; broadcast audio; keyword spotter; mel frequency cepstral coefficients; multiband harmonicity feature; preprocessor module; speaker recognition; spectral flow direction; speech boundary detection; speech boundary detector; speech recognition; Feature extraction; Harmonic analysis; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Conference, 2010 18th European

Conference_Location

Aalborg

ISSN

2219-5491

Type

conf

Filename

7096641