DocumentCode
705368
Title
Speech detection on broadcast audio
Author
Zubari, Unal ; Ozan, Ezgi Can ; Acar, Banu Oskay ; Ciloglu, Tolga ; Esen, Ersin ; Ates, Tugrul K. ; Onur, Duygu Oskay
Author_Institution
Video & Audio Process. Group, TUBITAK UZAY, Ankara, Turkey
fYear
2010
fDate
23-27 Aug. 2010
Firstpage
85
Lastpage
89
Abstract
Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Nonspeech via Gaussian Mixture Model (GMM) based classification. GMM´s are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC´s).
Keywords
Gaussian processes; audio signal processing; broadcasting; cepstral analysis; decision theory; mixture models; speaker recognition; GMM based classification; Gaussian mixture model; MFCC; SFD; activity-nonactivity decision; audio data; broadcast audio; keyword spotter; mel frequency cepstral coefficients; multiband harmonicity feature; preprocessor module; speaker recognition; spectral flow direction; speech boundary detection; speech boundary detector; speech recognition; Feature extraction; Harmonic analysis; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing Conference, 2010 18th European
Conference_Location
Aalborg
ISSN
2219-5491
Type
conf
Filename
7096641
Link To Document