Title :
Softening quantization in bag-of-audio-words
Author :
Pancoast, Stephanie ; Akbacak, Murat
Author_Institution :
Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
Abstract :
The audio component of multimedia data can be crucial for multimedia content analysis. Bag-of-audio-words (BoAW) approach is one of the most frequently used methods to represent audio content in multimedia event detection and related tasks. The method, however, has numerous criticisms, amongst which is the loss of information in the “vector quantization” step which generates word-like units. In this work, we address this issue by employing a soft quantization representation where the distance to the nearest codeword is incorporated into the model, rather than only using the nearest codeword´s index as is the case with hard quantization. We explore two techniques for soft quantization and apply it to the BoAW for multimedia event detection. We find the best setup yields a 13% improvement in mean average precision, improving performance for 27 of the 30 video events.
Keywords :
audio coding; multimedia systems; statistical analysis; vector quantisation; video coding; video retrieval; BoAW approach; audio component; audio content representation; bag-of-audio-words; hard quantization; information loss; mean average precision; multimedia content analysis; multimedia data; multimedia event detection; nearest codeword distance; nearest codeword index; soft quantization representation; vector quantization; video events; word-like units; Encoding; Event detection; Histograms; Mel frequency cepstral coefficient; Multimedia communication; Quantization (signal); Vectors; Bag-of-audio-words; multimedia event detection; soft quantization;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6853821