Title :
Multi-modal and Cross-Modal for Lecture Videos Retrieval
Author :
Nhu Van Nguyen ; Coustaty, M. ; Ogier, Jean-Marc
Author_Institution :
L3I, Univ. of La Rochelle, La Rochelle, France
Abstract :
The problem of multi-modal and cross-modal lecture videos retrieval is studied in this paper, on the basis of the use of document analysis techniques. In the context of this paper, a lecture video is represented by a set of subjects, in which a subject is represented by a Bag of mixed words -visual words and textual words-, each of them coming from speech recognition and OCR engines. Our work relies on two assumptions 1) a video may contain multiple subjects, 2) multiple modalities exist in the same lecture video document. We propose in this research a combination of technologies issuing from image document analysis and text mining. Visual words and textual words in images of lecture slides are extracted based on text detection and graphics localization computed on the sequences captured with a camera. Assuming that a subject in the video composes of a set of slides, lecture slides are clustered in different groups representing different possible subjects by using mixed words extracted. Multimodal and cross-modal lecture video retrieval are realized by the Bag of Subjects model. We discuss the proposed indexing and retrieval approach for lecture videos and report a quantitative evaluation on lecture videos of our University. It is shown that using Bag of Subjects for lecture video retrieval improves the retrieval accuracy.
Keywords :
computer graphics; data mining; document image processing; feature extraction; indexing; text detection; video cameras; video retrieval; video signal processing; OCR engines; bag of mixed words; bag of subjects model; bag of visual words; bag of-textual words; camera; cross-modal lecture video retrieval; document analysis techniques; graphics localization; image document analysis; indexing approach; lecture slide extraction; lecture video document; multimodal lecture video retrieval; speech recognition; text detection; text mining; Accuracy; Indexing; Semantics; Speech; Videos; Visualization;
Conference_Titel :
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location :
Stockholm
DOI :
10.1109/ICPR.2014.461