DocumentCode
2212307
Title
Scalable environmental sounds analysis
Author
Biatov, Konstantin
Author_Institution
Fraunhofer IAIS, St. Augustin, Germany
fYear
2009
fDate
28-30 Sept. 2009
Firstpage
1
Lastpage
6
Abstract
This paper describes a method for environmental audio events analysis. The audio events are modeled using a common universal codebook. The codebook is based on the bag-of-frames (BOF). The features corresponding to the frames and extracted from all audio files are grouped into clusters using the k-means algorithm. The individual audio file is modeled on the normalized distribution of the numbers of cluster bins corresponding to the frames of this file. Each audio file is described by one vector. The audio data are represented as feature-file matrix similar to term-document representation in latent semantic indexing (LSI). The LSI is applied to the feature-file matrix to represent the data in latent semantic space. Then the primary file description is converted to the vectors of similarity to anchor reference data. For anchor reference the training data are used. Each component of this vector is a probabilistic similarity between target file and anchor reference file corresponding to the considered component. The LSI is applied once more to the new feature-file matrix, mapping the data to the latent semantic space in the anchor reference space. For audio recognition and audio retrieval the nearest-neighbor (NN) algorithm is exploited. The described data representation improves the results of audio retrieval and recognition.
Keywords
audio coding; matrix algebra; probability; audio recognition; audio retrieval; bag-of-frames; codebook; environmental audio event analysis; feature-file matrix; k-means algorithm; latent semantic indexing; nearest-neighbor algorithm; scalable environmental sound analysis; Acoustic testing; Birds; Hidden Markov models; Indexing; Information retrieval; Large scale integration; Matrix converters; Neural networks; Production facilities; Spatial databases; Latent Semantic Indexing; anchor reference space; audio events recognition; audio events retrieval; common codebook; environmental sounds;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing and Communication Systems, 2009. ICSPCS 2009. 3rd International Conference on
Conference_Location
Omaha, NE
Print_ISBN
978-1-4244-4473-1
Electronic_ISBN
978-1-4244-4474-8
Type
conf
DOI
10.1109/ICSPCS.2009.5306423
Filename
5306423
Link To Document