Title :
Multi-oriented Text Recognition in Graphical Documents Using HMM
Author :
Roy, Partha Pratim ; Roy, Sandip ; Pal, Umapada
Author_Institution :
CVPR Unit, Indian Stat. Inst., Kolkata, India
Abstract :
The text lines in graphical documents (e.g., maps, engineering drawings), artistic documents etc., are often annotated in curve lines to illustrate different locations or symbols. For the optical character recognition of such documents, individual text lines from the documents need to be extracted and recognized. Due to presence of multi-oriented characters in such non-structured layout, word recognition is a challenging task. In this paper, we present an approach towards the recognition of scale and orientation invariant text words in graphical documents using Hidden Markov Models (HMM). First, a line extraction method is applied to segment text lines and the method is based on the foreground and background information of the text components. To effectively utilize the background information, a water reservoir concept is used here. For recognition of curved text lines, a path of sliding window is estimated and features extracted from the sliding window are fed to the HMM system for recognition. Local gradient histogram (LGH) based frame-wise feature is used in HMM. The experimental results are evaluated on a dataset of graphical words and we have obtained encouraging results.
Keywords :
document image processing; feature extraction; hidden Markov models; image segmentation; optical character recognition; HMM; LGH; artistic documents; background information; curve lines; curved text line recognition; feature extraction; foreground information; frame-wise feature; graphical documents; hidden Markov models; individual text line extraction; individual text line recognition; local gradient histogram; multioriented characters; multioriented text recognition; nonstructured layout; optical character recognition; orientation invariant text word recognition; scale invariant text word recognition; sliding window estimation; text line segmentation; water reservoir concept; Character recognition; Feature extraction; Graphics; Hidden Markov models; Histograms; Text recognition; Formatting; insert; style; styling;
Conference_Titel :
Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on
Conference_Location :
Tours
Print_ISBN :
978-1-4799-3243-6
DOI :
10.1109/DAS.2014.27