• DocumentCode
    169550
  • Title

    Development of a page segmentation technique for Bangla documents printed in italic style

  • Author

    Singh, Praveen Kumar ; Mahanta, Sajal ; Malakar, Samir ; Sarkar, Rituparna ; Nasipuri, Mita

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Jadavpur Univ., Kolkata, India
  • fYear
    2014
  • fDate
    9-11 Jan. 2014
  • Firstpage
    120
  • Lastpage
    125
  • Abstract
    Optical Character Recognition (OCR) is one of the most imperative prerequisites of electronic document analysis systems. Segmentation is the preliminary step of OCR, which has long been an active area of research. In this paper, we present a hierarchical system towards the segmentation of Bangla script document printed in two different styles viz., italic and bold italic with varying fonts and sizes. At first, the text lines are segmented from the document pages. Next, the words are segmented from the extracted text lines. Finally, the characters are segmented from the extracted word images by using a Trapezoidal Fuzzy membership function, which has been used for the detection of Matra region. The proposed technique is tested on 16 document pages consisting of 1456 words. The average success rates of the technique for text line, word and character segmentation are found to be 99.91%, 98.63% and 89.41% respectively.
  • Keywords
    document image processing; optical character recognition; Bangla documents; Bangla script document; OCR; bold italic style; character segmentation; document pages; electronic document analysis systems; hierarchical system; optical character recognition; page segmentation; trapezoidal fuzzy membership function; Histograms; Bangla script; Fuzzy membership function; Optical Character Recognition; Page Segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Business and Information Management (ICBIM), 2014 2nd International Conference on
  • Conference_Location
    Durgapur
  • Print_ISBN
    978-1-4799-3263-4
  • Type

    conf

  • DOI
    10.1109/ICBIM.2014.6970950
  • Filename
    6970950