DocumentCode :
3695087
Title :
A direct approach for word and character segmentation in run-length compressed documents with an application to word spotting
Author :
Mohammed Javed;P. Nagabhushan;B.B. Chaudhuri
Author_Institution :
Department of Studies in Computer Science, University of Mysore, 570006, India
fYear :
2015
Firstpage :
216
Lastpage :
220
Abstract :
Segmentation of a text document into lines, words and characters is an important objective in application like OCR and related analytics. However in today´s scenario, the documents are compressed for archival and transmission efficiency. Text segmentation in compressed documents warrants decompression, and needs additional computing resources. In this backdrop, the paper proposes a method for text segmentation directly in run-length compressed, printed English text documents. Line segmentation is done using the projection profile technique. Further segmentation into words and characters is accomplished by tracing the white runs along the base region of the text line. During the process, a run based region growing technique is applied in the spatial neighborhood of the white runs to trace the vertical space between the characters. After detecting the character spaces in the entire text line, the decision of word space and character space is made by computing the average character space. Subsequently based on the spatial position of the detected words and characters, their respective compressed segments are extracted. The proposed algorithm is tested with 1083 compressed text lines, and F-measure of 97.93% and 92.86% respectively for word and character segmentation are obtained. Finally an application of word spotting is also presented.
Keywords :
"Optical character recognition software","Image coding","Adaptation models"
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type :
conf
DOI :
10.1109/ICDAR.2015.7333755
Filename :
7333755
Link To Document :
بازگشت