• DocumentCode
    2631480
  • Title

    European language determination from image

  • Author

    Nakayama, Takehiro ; Spitz, A. Lawrence

  • Author_Institution
    Fuji Xerox Palo Alto Lab., CA, USA
  • fYear
    1993
  • fDate
    20-22 Oct 1993
  • Firstpage
    159
  • Lastpage
    162
  • Abstract
    The authors have developed a technique for determining the language from an image of text. This work is restricted to a small subset of European languages, but uses techniques which should be applicable across many more languages. The method first makes generalizations about images of characters, then performs gross classification of the isolated characters and agglomerates these class identities into spatially isolated (word) tokens. Analysis of corpora in English, French and German yields training data for a language classifier designed to codify the spatial relationships of the connected components which compose the letter-forms. Linear discriminant analysis provides classification criteria on which the test data are evaluated. The resulting process takes in images of text and produces a language classification based on image representations and generalizations about relative token shape frequency in the target languages
  • Keywords
    character recognition; image classification; linguistics; natural languages; English; European languages; French; German; class identities; classification criteria; corpora; gross classification; image representations; isolated characters; language classifier; language determination; linear discriminant analysis; spatial relationships; spatially isolated tokens; token shape frequency; training data; word tokens; Character recognition; Frequency; Image representation; Laboratories; Linear discriminant analysis; Natural languages; Optical character recognition software; Shape; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
  • Conference_Location
    Tsukuba Science City
  • Print_ISBN
    0-8186-4960-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1993.395759
  • Filename
    395759