Robust soccer highlight generation with a novel dominant-speech feature extractor

Author

Wan, Kongwah ; Xu, Changsheng

Author_Institution

Inst. for Infocomm Res., Singapore

Volume

1

fYear

2004

fDate

30-30 June 2004

Firstpage

591

Abstract

We describe soccer highlight generation from only the audio stream in the video. A novel audio feature is used to detect parts of the commentary corresponding to dominant and excited speech. It is computed by a twice-iterated composite Fourier transform (CFT) on short-time windows, wherein the magnitude spectrum of the first transform is input to a second transform. Dominant speech portions are found to be robustly characterized by increased density in the peak profile. We verify the robustness of CFT via large scale empirical testing and explain its working based on a pulse train postulate of dominant speech signals. Our audio-only approach results in a compute-efficient system deployable on current generation set-top-boxes and digital video recording devices

Keywords

Fourier transforms; audio signal processing; feature extraction; iterative methods; pattern recognition; speech processing; audio feature; audio stream; digital video recording devices; dominant-speech feature extraction; excited speech; iterated composite Fourier transform; magnitude spectrum; pulse train postulate; set-top-boxes; short-time windows; soccer highlight generation; Acoustic noise; Computer architecture; Feature extraction; Fourier transforms; Large-scale systems; Robustness; Speech enhancement; Streaming media; Testing; US Department of Transportation;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on

Conference_Location

Taipei

Print_ISBN

0-7803-8603-5

Type

conf

DOI

10.1109/ICME.2004.1394261

Filename

1394261