مرکز منطقه ای اطلاع رساني علوم و فناوري - Multi-modal and Cross-Modal for Lecture Videos Retrieval

DocumentCode :

178310

Title :

Multi-modal and Cross-Modal for Lecture Videos Retrieval

Author :

Nhu Van Nguyen ; Coustaty, M. ; Ogier, Jean-Marc

Author_Institution :

L3I, Univ. of La Rochelle, La Rochelle, France

fYear :

2014

fDate :

24-28 Aug. 2014

Firstpage :

2667

Lastpage :

2672

Abstract :

The problem of multi-modal and cross-modal lecture videos retrieval is studied in this paper, on the basis of the use of document analysis techniques. In the context of this paper, a lecture video is represented by a set of subjects, in which a subject is represented by a Bag of mixed words -visual words and textual words-, each of them coming from speech recognition and OCR engines. Our work relies on two assumptions 1) a video may contain multiple subjects, 2) multiple modalities exist in the same lecture video document. We propose in this research a combination of technologies issuing from image document analysis and text mining. Visual words and textual words in images of lecture slides are extracted based on text detection and graphics localization computed on the sequences captured with a camera. Assuming that a subject in the video composes of a set of slides, lecture slides are clustered in different groups representing different possible subjects by using mixed words extracted. Multimodal and cross-modal lecture video retrieval are realized by the Bag of Subjects model. We discuss the proposed indexing and retrieval approach for lecture videos and report a quantitative evaluation on lecture videos of our University. It is shown that using Bag of Subjects for lecture video retrieval improves the retrieval accuracy.

Keywords :

computer graphics; data mining; document image processing; feature extraction; indexing; text detection; video cameras; video retrieval; video signal processing; OCR engines; bag of mixed words; bag of subjects model; bag of visual words; bag of-textual words; camera; cross-modal lecture video retrieval; document analysis techniques; graphics localization; image document analysis; indexing approach; lecture slide extraction; lecture video document; multimodal lecture video retrieval; speech recognition; text detection; text mining; Accuracy; Indexing; Semantics; Speech; Videos; Visualization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Pattern Recognition (ICPR), 2014 22nd International Conference on

Conference_Location :

Stockholm

ISSN :

1051-4651

Type :

conf

DOI :

10.1109/ICPR.2014.461

Filename :

6977173

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=178310