DocumentCode :
135089
Title :
Text line identification in Tagore´s manuscript
Author :
Adak, Chandranath ; Chaudhuri, Bidyut B.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Kalyani, Kalyani, India
fYear :
2014
fDate :
Feb. 28 2014-March 2 2014
Firstpage :
210
Lastpage :
213
Abstract :
In this paper, a text line identification method is proposed. The text lines of printed document are easy to segment due to uniform straightness of the lines and sufficient gap between the lines. But in handwritten documents, the line is nonuniform and interline gaps are variable. We take Rabindranath Tagore´s manuscript as it is one of the most difficult manuscripts that contain doodles. Our method consists of a preprocessing stage to clean the document image. Then we separate doodles from the manuscript to get the textual region. After that we identify the text lines on the manuscript. For text line identification, we use window examination, black run-length smearing, horizontal histogram and connected component analysis.
Keywords :
document image processing; handwritten character recognition; image segmentation; optical character recognition; text analysis; Rabindranath Tagore manuscript; black run-length smearing; connected component analysis; document image cleaning; doodle separation; handwritten documents; horizontal histogram; nonuniform line; preprocessing stage; printed document; text line identification method; text line segmentation; textual region; uniform line straightness; variable interline gaps; window examination; Character recognition; Frequency modulation; Handwriting recognition; Histograms; Image analysis; Optical filters; Text analysis; document image analysis; doodle; handwritten document; manuscript processing; text line identification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Students' Technology Symposium (TechSym), 2014 IEEE
Conference_Location :
Kharagpur
Print_ISBN :
978-1-4799-2607-7
Type :
conf
DOI :
10.1109/TechSym.2014.6808048
Filename :
6808048
Link To Document :
بازگشت