Title :
Fusion of Word Spotting and Spatial Information for Figure Caption Retrieval in Historical Document Images
Author :
Khurshid, Khurram ; Faure, Claudie ; Vincent, Nicole
Author_Institution :
Lab. CRIP5-SIP, Univ. Paris Descartes, Paris, France
Abstract :
We present a method for figure caption detection by employing a fusion of several information sources. The evaluation is performed on documents gathered from the collection of the historical medical digital library Medic@. A method based on perceptual grouping simultaneously segments the vertical and horizontal text lines in a page. Spatial relationships between the text lines and the graphics are considered to select a set of caption line candidates. A feature-based word-spotting method is proposed to retrieve the occurrences of word images similar to a given query.Word-spotting is applied to detect the label of the captions, a word like dasiaFigpsila, dasiaFIGpsila, dasiaFigurepsila ...followed by the figure number. Combining spatial information and word recognition greatly improve the detection of caption lines. Our initial experiments process more than 300 pages from three different books.
Keywords :
computer graphics; digital libraries; document image processing; image retrieval; image segmentation; text analysis; word processing; document image processing; figure caption retrieval; historical document image; historical medical digital library; image segmentation; spatial information; vertical text line; word spotting; Biomedical imaging; Books; Data mining; Graphics; Image retrieval; Indexing; Information analysis; Information retrieval; Software libraries; Text analysis; dynamic time warping; edit distance; spatial perception; word spotting;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.161