• DocumentCode
    135059
  • Title

    An approach for printed document labeling

  • Author

    Adak, Chandranath

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Kalyani, Kalyani, India
  • fYear
    2014
  • fDate
    1-2 Feb. 2014
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    A document image contains texts and non-texts, it may be printed, handwritten, or hybrid of both. In this paper we deal with printed document where textual region is of printed characters, and non-texts are mainly photo images. Here we propose a model which performs labeling of different components of a printed document image, i.e. identification of heading, subheading, caption, article and photo. Our method consists of a preprocessing stage where fuzzy c-means clustering is used to segment the document image into printed (object) region and background. Then Hough transformation is used to find white-line dividers of object region and grid structure examination is used to extract the non-text portion. After that, we use horizontal histogram to find text lines and then we label different components. Our method gives promising results on printed document of different scripts.
  • Keywords
    Hough transforms; document image processing; fuzzy set theory; pattern clustering; text analysis; Hough transformation; document image; fuzzy c-means clustering; grid structure examination; horizontal histogram; nontext portion; object region; preprocessing stage; printed characters; printed document image; printed document labeling; textual region; white-line dividers; Histograms; Image analysis; Image segmentation; Labeling; Optical character recognition software; Text analysis; Transforms; Document Image Analysis; Document Labeling; Fuzzy C-Means Clustering; Hough Transform; Optical Character Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automation, Control, Energy and Systems (ACES), 2014 First International Conference on
  • Conference_Location
    Hooghy
  • Print_ISBN
    978-1-4799-3893-3
  • Type

    conf

  • DOI
    10.1109/ACES.2014.6808032
  • Filename
    6808032