Speech and music classification in audio documents

Author

Pinquier, Julien ; Senac, Christine

Author_Institution

Régine André-Obrecht, IRIT, France

Volume

fYear

2002

fDate

13-17 May 2002

Abstract

To index efficiently the soundtrack of multimedia documents, it is necessary to extract elementary and homogeneous acoustic segments. In this paper, we explore such a prior partitioning which consists in detect the two basic components, which are speech and music components. The originality of this work is that music and speech are not considered as two classes and two classification systems are independently defined, a speech/non-speech one and a music/non-music one. This approach permits to better characterize and discriminate each component: in particular, two different feature spaces are necessary as two pairs of Gaussian mixture models. More, the acoustic signal is divided into four types of segments: speech, music, speech-music and other. The experiments are performed on the soundtracks of audio video documents (films, TV sport broadcasts). The performance proves the interest of this approach, so called the Differentiated Modeling Approach.

Keywords

Colored noise; Speech; Speech enhancement;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on

Conference_Location

Orlando, FL, USA

ISSN

1520-6149

Print_ISBN

0-7803-7402-9

Type

conf

DOI

10.1109/ICASSP.2002.5745593

Filename

5745593

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2882622