Title :
Automatic script identification from document images using cluster-based templates
Author :
Hochberg, Judith ; Kelly, Patrick ; Thomas, Timothy ; Kerns, Lila
Author_Institution :
Los Alamos Nat. Lab., NM, USA
fDate :
2/1/1997 12:00:00 AM
Abstract :
We describe an automated script identification system for typeset document images. Templates for each script are created by clustering textual symbols from a training set. Symbols from new images are compared to the templates to find the best script. Our current system processes thirteen scripts with minimal preprocessing and high accuracy
Keywords :
optical character recognition; automatic script identification; cluster-based templates; document images; textual symbol clustering; typeset document images; Character recognition; Image analysis; Indexing; Laboratories; Natural languages; Optical character recognition software; Postal services; Shape; Text analysis; Typesetting;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on