• DocumentCode
    3340653
  • Title

    Multi-Oriented English Text Line Extraction Using Background and Foreground Information

  • Author

    Roy, Partha Pratim ; Pal, Umapada ; Llados, Josep ; Kimura, Fumitaka

  • Author_Institution
    Comput. Vision Center, Univ. Autonoma De Barcelona, Barcelona
  • fYear
    2008
  • fDate
    16-19 Sept. 2008
  • Firstpage
    315
  • Lastpage
    322
  • Abstract
    In graphical documents (map, engineering drawing), artistic documents etc. there exist many printed materials where text lines are not parallel to each other and they are multi-oriented and curve in nature. For the OCR of such documents we need to extract individual text lines from the documents. Extraction of individual text lines from multi-oriented and/or curved text document is a difficult problem. In this paper, we propose a novel method to extract individual text lines from such document pages and the method is based on the foreground and background information of the characters of the text. To take care of background information, water reservoir concept is used here. In the proposed scheme at first, individual components are detected and grouped into 3-character clusters using their inter-component distance, size and positional information. Applying concept of graph, initial 3-character clusters are merged to have larger cluster group. Using inter-character background information, orientations of the extreme characters of a larger cluster are decided and based on these orientation, two candidate regions are formed from the cluster. Finally, with the help of these candidate regions, individual lines are extracted. From the experiment, we obtained encouraging result.
  • Keywords
    document image processing; feature extraction; optical character recognition; text analysis; artistic documents; background information; curved text document; document pages; foreground information; graphical documents; multioriented english text line extraction; Character recognition; Data mining; Image segmentation; Information analysis; Optical character recognition software; Pattern analysis; Pattern recognition; Reservoirs; Text analysis; Water resources;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
  • Conference_Location
    Nara
  • Print_ISBN
    978-0-7695-3337-7
  • Type

    conf

  • DOI
    10.1109/DAS.2008.83
  • Filename
    4669976