Title :
Semantic indexing of multimedia using audio, text and visual cues
Author :
Iyengar, G. ; Nock, H. ; Neti, C. ; Franz, M.
Author_Institution :
IBM TJ Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
We describe methods for automatic labeling of high-level semantic concepts in documentary style videos. The emphasis of this paper is on audio processing and on fusing information from multiple modalities. The work described represents initial work towards a trainable system that acquires a collection of generic "intermediate" semantic concepts across modalities (such as audio, video, text) and combines information from these modalities for automatic labeling of a "high-level" concept. Initial results suggest that multi-modal fusion achieves a 12.5% relative improvement over the best unimodal model.
Keywords :
audio databases; audio signal processing; database indexing; multimedia databases; video databases; video signal processing; audio cues; audio processing; automatic labeling; digital video libraries; documentary style videos; high-level semantic concepts; multi-modal fusion; semantic multimedia indexing; text cues; trainable system; unimodal model; visual cues; Content based retrieval; Data mining; Indexing; Information retrieval; Labeling; Multimedia systems; Rockets; Software libraries; Speech processing; Videos;
Conference_Titel :
Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7803-7304-9
DOI :
10.1109/ICME.2002.1035607