Title :
Development of a page segmentation technique for Bangla documents printed in italic style
Author :
Singh, Praveen Kumar ; Mahanta, Sajal ; Malakar, Samir ; Sarkar, Rituparna ; Nasipuri, Mita
Author_Institution :
Dept. of Comput. Sci. & Eng., Jadavpur Univ., Kolkata, India
Abstract :
Optical Character Recognition (OCR) is one of the most imperative prerequisites of electronic document analysis systems. Segmentation is the preliminary step of OCR, which has long been an active area of research. In this paper, we present a hierarchical system towards the segmentation of Bangla script document printed in two different styles viz., italic and bold italic with varying fonts and sizes. At first, the text lines are segmented from the document pages. Next, the words are segmented from the extracted text lines. Finally, the characters are segmented from the extracted word images by using a Trapezoidal Fuzzy membership function, which has been used for the detection of Matra region. The proposed technique is tested on 16 document pages consisting of 1456 words. The average success rates of the technique for text line, word and character segmentation are found to be 99.91%, 98.63% and 89.41% respectively.
Keywords :
document image processing; optical character recognition; Bangla documents; Bangla script document; OCR; bold italic style; character segmentation; document pages; electronic document analysis systems; hierarchical system; optical character recognition; page segmentation; trapezoidal fuzzy membership function; Histograms; Bangla script; Fuzzy membership function; Optical Character Recognition; Page Segmentation;
Conference_Titel :
Business and Information Management (ICBIM), 2014 2nd International Conference on
Conference_Location :
Durgapur
Print_ISBN :
978-1-4799-3263-4
DOI :
10.1109/ICBIM.2014.6970950