• DocumentCode
    1992776
  • Title

    A model-based line detection algorithm in documents

  • Author

    Zheng, Yefeng ; Li, Huiping ; Doermann, David

  • Author_Institution
    Lab. for Language & Media Process., Maryland Univ., College Park, MD, USA
  • fYear
    2003
  • fDate
    3-6 Aug. 2003
  • Firstpage
    44
  • Abstract
    In this paper we present a novel model based approach to detect severely broken parallel lines in noisy textual documents. It is important to detect and remove these lines so the text can be segmented and recognized. We use directional single-connected chain, a vectorization based algorithm, to extract the line segments. We then instantiate a parallel line model with three parameters: the skew angle, the vertical line gap, and the vertical translation. A coarse-to-fine approach is used to improve the estimation accuracy. From the model we can incorporate the high level contextual information to enhance detection results even when lines are severely broken. Our experimental results show our method can detect 94% of the lines in our database with 168 noisy Arabic document images.
  • Keywords
    image enhancement; image processing; image recognition; image segmentation; optical character recognition; text analysis; coarse-to-fine approach; directional single-connected chain; estimation accuracy; line segment extraction; model-based line detection algorithm; noisy Arabic document images; noisy textual documents; parallel line model; severely broken parallel lines; skew angle; text recognition; text segmentation; vectorization based algorithm; vertical line gap; vertical translation; Concurrent computing; Context modeling; Data mining; Detection algorithms; Educational institutions; Electronic mail; Image segmentation; Laboratories; Optical character recognition software; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
  • Print_ISBN
    0-7695-1960-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2003.1227625
  • Filename
    1227625