DocumentCode :
1653751
Title :
N-gram extension for bag-of-audio-words
Author :
Pancoast, Stephanie ; Akbacak, Murat
Author_Institution :
Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
fYear :
2013
Firstpage :
778
Lastpage :
782
Abstract :
Bag-of-audio-words is one of the most frequently used methods for incorporating an audio component into multimedia event detection and related tasks. A main criticism of the method, however, is that it ignores context. Each “word” is considered in isolation, ignoring its neighbors. We address this issue by representing the document by its audio word N-grams. Unlike words from natural language, audio words are generated by clustering algorithms where the number of clusters is specified by the researcher. We therefore also explore how the performance of the N-gram representation varies with codebook size. With this enhanced representation, we find the average probability of miss noticeably decreases when evaluated on TRECVID 2011 and 2012 datasets, indicating clear improvements on the multimedia event detection task.
Keywords :
audio systems; codes; multimedia communication; pattern clustering; probability; TRECVID 2011 dataset; TRECVID 2012 dataset; audio word N-gram representation extension; average probability; bag-of-audio-word generation; clustering algorithm; codebook; document representation; multimedia event detection; natural language; Event detection; Histograms; Multimedia communication; NIST; Natural languages; Vectors; Videos; Bag-of-audio-words; N-gram models; multimedia event detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6637754
Filename :
6637754
Link To Document :
بازگشت