DocumentCode :
2974515
Title :
Separation of text from non-text doodles of poet Rabindranath Tagore´s manuscripts
Author :
Chaudhuri, Bidyut B. ; Saraf, Arpit ; Kumari, Akanksha ; Borah, Sagarika ; Goyal, Ankur
Author_Institution :
C.V.P.R. Unit, Indian Stat. Inst., Kolkata, India
fYear :
2012
fDate :
21-22 Nov. 2012
Firstpage :
1
Lastpage :
5
Abstract :
As gaining popularity of internet facilities have given a convenient and faster approach to mine a warehouse of both historical and contemporary handwritten documents; this has led to a continuous research and development in the field of information retrieval algorithm. In such handwritten documents, graphics and images are combined with text and often overlap one another. This paper presents a technique for separating textual data from non-textual information. The technique is based on some already published works. It is implemented in poet Rabindranath Tagore´s manuscript. The approach generates connected components as basic primitive and tries to classify them as text or non-text based on a comparison between the total number of pixels and the number of boundary pixels constituting the component. A window is generated and further separation is done on the basis of the stroke width computed for each window. The paper also contains a brief review on some of the already published works.
Keywords :
Internet; data warehouses; handwritten character recognition; history; information retrieval; text analysis; text detection; Internet; Rabindranath Tagore manuscripts; contemporary handwritten document warehouse; historical handwritten document warehouse; information retrieval algorithm; nontext doodles; nontextual information; poet; research and development; text graphics; text images; textual data separation; Accuracy; Algorithm design and analysis; Computers; Feature extraction; Graphics; Image analysis; Labeling; Connected Components; Non text Doodles; Rabindranath Tagore; Stroke Width; Text; pixels;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing and Communication Systems (NCCCS), 2012 National Conference on
Conference_Location :
Durgapur
Print_ISBN :
978-1-4673-1952-2
Type :
conf
DOI :
10.1109/NCCCS.2012.6413000
Filename :
6413000
Link To Document :
بازگشت