• DocumentCode
    3142265
  • Title

    A statistically based, highly accurate text-line segmentation method

  • Author

    Liang, Jisheng ; Phillips, Ihsin T. ; Haralick, Robert M.

  • Author_Institution
    Dept. of Electr. Eng., Washington Univ., Seattle, WA, USA
  • fYear
    1999
  • fDate
    20-22 Sep 1999
  • Firstpage
    551
  • Lastpage
    554
  • Abstract
    This paper describes a text-line identification and segmentation technique that is probability based, where all probabilities are estimated from an extensive training set of various kind of measurements of distances between the terminal and non-terminal entities with which the algorithm works. The off-line probabilities estimated in the training then drive all decisions in the on-line segmentation algorithm. On the UW-III database of some 1600 scanned document image pages, having some 105020 text lines, the algorithm identifies and segments 104773 correctly, an accuracy of 99.76%
  • Keywords
    document image processing; image segmentation; probability; statistical analysis; visual databases; UW-III database; document image scanning; probability; statistical analysis; text-line identification; text-line segmentation method; training set; Computer science; Electric variables measurement; Image databases; Image segmentation; Labeling; Partitioning algorithms; Probability; Tellurium;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    0-7695-0318-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1999.791847
  • Filename
    791847