Title :
Audio content analysis for online audiovisual data segmentation and classification
Author :
Zhang, Tong ; Kuo, C. C Jay
Author_Institution :
Dept. of Electr. Eng. Syst., Univ. of Southern California, Los Angeles, CA, USA
fDate :
5/1/2001 12:00:00 AM
Abstract :
While current approaches for audiovisual data segmentation and classification are mostly focused on visual cues, audio signals may actually play a more important role in content parsing for many applications. An approach to automatic segmentation and classification of audiovisual data based on audio content analysis is proposed. The audio signal from movies or TV programs is segmented and classified into basic types such as speech, music, song, environmental sound, speech with music background, environmental sound with music background, silence, etc. Simple audio features including the energy function, the average zero-crossing rate, the fundamental frequency, and the spectral peak tracks are extracted to ensure the feasibility of real-time processing. A heuristic rule-based procedure is proposed to segment and classify audio signals and built upon morphological and statistical analysis of the time-varying functions of these audio features. Experimental results show that the proposed scheme achieves an accuracy rate of more than 90% in audio classification.
Keywords :
audio signal processing; content-based retrieval; feature extraction; mathematical morphology; real-time systems; signal classification; statistical analysis; TV programs; accuracy rate; audio content analysis; audio features extraction; audio signal classification; audio signals; average zero-crossing rate; content parsing; energy function; environmental sound; fundamental frequency; heuristic rule-based procedure; morphological analysis; movies; music; online audiovisual data classification; online audiovisual data segmentation; real-time processing; song; spectral peak tracks; speech; statistical analysis; time-varying functions; visual cues; Frequency; Indexing; Information retrieval; Layout; Motion pictures; Multiple signal classification; Music; Speech; Statistical analysis; TV;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on