DocumentCode
2631480
Title
European language determination from image
Author
Nakayama, Takehiro ; Spitz, A. Lawrence
Author_Institution
Fuji Xerox Palo Alto Lab., CA, USA
fYear
1993
fDate
20-22 Oct 1993
Firstpage
159
Lastpage
162
Abstract
The authors have developed a technique for determining the language from an image of text. This work is restricted to a small subset of European languages, but uses techniques which should be applicable across many more languages. The method first makes generalizations about images of characters, then performs gross classification of the isolated characters and agglomerates these class identities into spatially isolated (word) tokens. Analysis of corpora in English, French and German yields training data for a language classifier designed to codify the spatial relationships of the connected components which compose the letter-forms. Linear discriminant analysis provides classification criteria on which the test data are evaluated. The resulting process takes in images of text and produces a language classification based on image representations and generalizations about relative token shape frequency in the target languages
Keywords
character recognition; image classification; linguistics; natural languages; English; European languages; French; German; class identities; classification criteria; corpora; gross classification; image representations; isolated characters; language classifier; language determination; linear discriminant analysis; spatial relationships; spatially isolated tokens; token shape frequency; training data; word tokens; Character recognition; Frequency; Image representation; Laboratories; Linear discriminant analysis; Natural languages; Optical character recognition software; Shape; Testing; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location
Tsukuba Science City
Print_ISBN
0-8186-4960-7
Type
conf
DOI
10.1109/ICDAR.1993.395759
Filename
395759
Link To Document