DocumentCode
1174446
Title
Texture for script identification
Author
Busch, Andrew ; Boles, Wageeh W. ; Sridharan, Sridha
Author_Institution
Sch. of Microelectronic Eng., Griffith Univ., Nathan, Qld., Australia
Volume
27
Issue
11
fYear
2005
Firstpage
1720
Lastpage
1732
Abstract
The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.
Keywords
document image processing; image texture; text analysis; visual databases; document image; script database; script identification; texture features; visual texture; Character recognition; Image analysis; Image databases; Indexing; Optical character recognition software; Sorting; Spatial databases; Text analysis; Training data; Visual databases; Index Terms- Script identification; classification and association rules.; clustering; document analysis; texture; wavelets and fractals; Algorithms; Artificial Intelligence; Automatic Data Processing; Documentation; Handwriting; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Models, Statistical; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Reading; Reproducibility of Results; Sensitivity and Specificity; Signal Processing, Computer-Assisted; Subtraction Technique;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2005.227
Filename
1512053
Link To Document