• DocumentCode
    2016464
  • Title

    On Segmentation of Documents in Complex Scripts

  • Author

    Kumar, K. S Sesh ; Kumar, Sukesh ; Jawahar, C.V.

  • Author_Institution
    Int. Inst. of Inf. Technol., Hyderabad
  • Volume
    2
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    1243
  • Lastpage
    1247
  • Abstract
    Document image segmentation algorithms primarily aim at separating text and graphics in presence of complex layouts. However, for many non-Latin scripts, segmentation becomes a challenge due to the characteristics of the script. In this paper, we empirically demonstrate that successful algorithms for Latin scripts may not be very effective for Indic and complex scripts. We explain this based on the differences in the spatial distribution of symbols in the scripts. We argue that the visual information used for segmentation needs to be enhanced with other information like script models for accurate results.
  • Keywords
    computer graphics; document image processing; image segmentation; natural language processing; text analysis; Indic scripts; complex layouts; complex scripts; document image segmentation algorithms; document segmentation; graphics; non-Latin scripts; spatial distribution; text; Algorithm design and analysis; Data mining; Failure analysis; Graphics; Image segmentation; Image texture analysis; Information technology; Layout; Shape measurement; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4377114
  • Filename
    4377114