DocumentCode
857247
Title
Content analysis for audio classification and segmentation
Author
Lu, Lie ; Zhang, Hong-Jiang ; Jiang, Hao
Author_Institution
Microsoft Res. Asia, Beijing, China
Volume
10
Issue
7
fYear
2002
fDate
10/1/2002 12:00:00 AM
Firstpage
504
Lastpage
516
Abstract
We present our study of audio content analysis for classification and segmentation, in which an audio stream is segmented according to audio type or speaker identity. We propose a robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence. Audio classification is processed in two steps, which makes it suitable for different applications. The first step of the classification is speech and nonspeech discrimination. In this step, a novel algorithm based on K-nearest-neighbor (KNN) and linear spectral pairs-vector quantization (LSP-VQ) is developed. The second step further divides nonspeech class into music, environment sounds, and silence with a rule-based classification scheme. A set of new features such as the noise frame ratio and band periodicity are introduced and discussed in detail. We also develop an unsupervised speaker segmentation algorithm using a novel scheme based on quasi-GMM and LSP correlation analysis. Without a priori knowledge, this algorithm can support the open-set speaker, online speaker modeling and real time segmentation. Experimental results indicate that the proposed algorithms can produce very satisfactory results.
Keywords
audio signal processing; correlation methods; noise; signal classification; spectral analysis; speech processing; vector quantisation; K-nearest-neighbor based algorithm; LSP correlation analysis; LSP-VQ; audio classification; audio content analysis; audio stream segmentation; audio type; band periodicity; environment sound; linear spectral pairs-vector quantization; music; noise frame ratio; nonspeech discrimination; online speaker modeling; open-set speaker; quasi-GMM; real time segmentation; robust approach; rule-based classification; silence; speaker identity; speech discrimination; unsupervised speaker segmentation algorithm; Acoustic noise; Algorithm design and analysis; Loudspeakers; Music; Quantization; Robustness; Signal to noise ratio; Speech; Streaming media; Working environment noise;
fLanguage
English
Journal_Title
Speech and Audio Processing, IEEE Transactions on
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/TSA.2002.804546
Filename
1045282
Link To Document