• DocumentCode
    3024003
  • Title

    Distinguishing mathematics notation from English text using computational geometry

  • Author

    Drake, Derek M. ; Baird, Henry S.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Lehigh Univ., Bethlehem, PA, USA
  • fYear
    2005
  • fDate
    Aug. 31 2005-Sept. 1 2005
  • Firstpage
    1270
  • Abstract
    A trainable method for distinguishing between mathematics notation and natural language (here, English) in images of textlines, using computational geometry methods only with no assistance from symbol recognition, is described. The input to our method is a "neighbor graph" extracted from a bilevel image of an isolated textline by the method of Kise et al. (1998): this is a pruned form of Delaunay triangulation of the set of locations of black connected components. Our method first attempts to classify each vertex and, separately, each edge of the neighbor graph as belonging to math or English; then these results are combined to yield a classification of the entire textline. All three classifiers are automatically trainable. Features for the vertex and edge classifiers were selected semi-manually from a large number in a process driven by training data: this stage is potentially fully automatable. In experiments on images scanned from books and images generated synthetically, this methodology converged in three iterations to a textline classifier with an error rate of less than one percent.
  • Keywords
    character recognition; computational geometry; computational linguistics; natural languages; Delaunay triangulation; English text; bilevel image; computational geometry; mathematics notation; natural language; neighbor graph; symbol recognition; textlines; Computational geometry; Displays; Error analysis; Image recognition; Mathematics; Natural languages; Optical character recognition software; Text analysis; Text recognition; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
  • Conference_Location
    Seoul, South Korea
  • ISSN
    1520-5263
  • Print_ISBN
    0-7695-2420-6
  • Type

    conf

  • DOI
    10.1109/ICDAR.2005.89
  • Filename
    1575746