DocumentCode :
584858
Title :
A novel approach to text line and word segmentation on odia printed documents
Author :
Senapati, D. ; Rout, Sritam ; Nayak, M.
Author_Institution :
Dept. of Comput. Sci. & Applic., Utkal Univ., Bhubaneswar, India
fYear :
2012
fDate :
26-28 July 2012
Firstpage :
1
Lastpage :
6
Abstract :
The OCR is an electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. The Optical Character System is available for various languages, such as English, Chinese and Arabic script, but it is commercially not available for Odia script. We have taken a step to develop OCR system for Odia language. The OCR is popular for its various applications potentials in banks, library automation, post-offices, defense organizations and language processing. Line and Word segmentation is one of the important steps of OCR system. The accuracy of the word/character recognition is directly affected by the correctness/ incorrectness of text-line and word segmentation. In this paper we have proposed a robust method for segmentation of individual text lines of Odia printed document image file. The segmented text line is the input for the word segmentation method which produces segmented words. Both foreground and background information are used in the proposed method. We have tested our method on scanned Odia scripts as well as some multi-script documents and obtained encouraging result. This technique is based on the intensities of pixels in the document.
Keywords :
document image processing; handwritten character recognition; image segmentation; natural language processing; optical character recognition; text analysis; Arabic script; Chinese language; English language; OCR system; Odia language; Odia printed document image file; Odia script; character recognition; handwritten text; machine-encoded text; multiscript document; optical character system; pixel intensity; printed text; text line segmentation; text-line incorrectness; typewritten text; word recognition; word segmentation; Image resolution; Image segmentation; Integrated optics; Optical character recognition software; Optical imaging; Robustness; US Department of Defense;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing Communication & Networking Technologies (ICCCNT), 2012 Third International Conference on
Conference_Location :
Coimbatore
Type :
conf
DOI :
10.1109/ICCCNT.2012.6396063
Filename :
6396063
Link To Document :
بازگشت