Title :
Detecting bandlimited audio in broadcast television shows
Author :
Fuhs, Mark C. ; Jin, Qin ; Schultz, Tanja
Author_Institution :
Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA
Abstract :
For TV and radio shows containing narrowband speech, Speech-to-text (STT) accuracy on the narrowband audio can be improved by using an acoustic model trained on acoustically matched data. To selectively apply it, one must first be able to accurately detect which audio segments are narrowband. The present paper explores two different bandwidth classification approaches: a traditional Gaussian mixture model (GMM) approach and a spline-based classifier that categorizes audio segments based on their power spectra. We focus on shows found in the DARPA GALE Mandarin training and test sets, where the ratio of wideband to narrowband shows is very large. In this setting, the spline-based classifier reduces the number of misclassified wideband segments by up to 95% relative to the GMM-based classifier for the same number of misclassified narrowband segments.
Keywords :
pattern classification; speech recognition; speech synthesis; splines (mathematics); Gaussian mixture model; TV shows; acoustically matched data; audio segments; bandlimited audio detection; bandwidth classification; broadcast television shows; misclassified narrowband segments; narrowband audio; narrowband speech; radio shows; speech-to-text accuracy; spline-based classifier; Acoustic signal detection; Acoustic testing; Bandwidth; Decoding; Narrowband; Speech; Spline; TV broadcasting; Telephony; Wideband; Speech processing; pattern classification; speech recognition; telephony;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2009.4960652