مرکز منطقه ای اطلاع رساني علوم و فناوري - A Two-Stage Approach for Word Spotting in Graphical Documents

DocumentCode :

3486450

Title :

A Two-Stage Approach for Word Spotting in Graphical Documents

Author :

Tarafdar, Arundhati ; Pal, Umapada ; Roy, Partha Pratim ; Ragot, N. ; Ramel, Jean-Yves

Author_Institution :

CVPR Unit, Indian Stat. Inst., Kolkata, India

fYear :

2013

fDate :

25-28 Aug. 2013

Firstpage :

319

Lastpage :

323

Abstract :

Presence of multi-oriented characters, connected characters with graphical lines, intersection of text and symbols with graphical lines/curves etc. are very common in graphical documents. As a result word spotting in graphical documents is still a challenging task that we try to solve (partially) in this paper. The proposed approach proceeds in two stages. In the first stage, recognition of isolated components is done using rotation invariant features and an SVM classifier. The characters having good recognition score and match in the query string are first selected for initial spotting. Because of structural complexity of graphical documents as well as of touching components, we may miss some of the query characters during initial spotting in some documents. In that case, based on the position, size and orientation of the recognized characters in the input document image, regions where missing characters may be located (candidate regions) are defined. In the second stage, Scale Invariant Feature Transform (SIFT) is used to find those missing characters in the candidate regions for possible spotting. Finally, using the position, size, orientation as well as intercharacter gap information of the recognized components, spotting is validated. Experimental results demonstrate that the method is efficient to locate a query word in multi-oriented and/or touching graphical documents.

Keywords :

document image processing; image classification; image retrieval; support vector machines; SIFT; SVM classifier; graphical documents; graphical lines; input document image; multi-oriented characters; query characters; query string; rotation invariant features; scale invariant feature transform; structural complexity; two-stage approach; word spotting; Character recognition; Feature extraction; Shape; Support vector machines; Text analysis; Text recognition; Document Image Analysis; Graphical documents; Information Retrieval; SIFT; Word Spotting;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition (ICDAR), 2013 12th International Conference on

Conference_Location :

Washington, DC

ISSN :

1520-5363

Type :

conf

DOI :

10.1109/ICDAR.2013.71

Filename :

6628636

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3486450