DocumentCode
2633182
Title
Extended character defect model for recognition of text from maps
Author
Pezeshk, Aria ; Tutwiler, Richard L.
Author_Institution
Appl. Res. Lab., Pennsylvania State Univ., State College, PA, USA
fYear
2010
fDate
23-25 May 2010
Firstpage
85
Lastpage
88
Abstract
Topographic maps contain a small amount of text compared to other forms of printed documents. Furthermore, the text and graphical components typically intersect with one another thus making the extraction of text a very difficult task. Creating training sets with a suitable size from the actual characters in maps would therefore require the laborious processing of many maps with similar features and the manual extraction of character samples. This paper extends the types of defects represented by Baird´s document image degradation model in order to create pseudo randomly generated training sets that closely mimic the various artifacts and defects encountered in characters extracted from maps. Two Hidden Markov Models are then trained and used to recognize the text. Tests performed on extracted street labels show an improvement in performance from 88.4% when only the original Baird´s model is used to a character recognition rate of 93.2% when the extended defect model is used for training.
Keywords
cartography; document image processing; hidden Markov models; learning (artificial intelligence); text analysis; document image degradation model; extended character defect model; hidden Markov models; pseudo randomly generated training sets; text recognition; topographic maps; Artificial neural networks; Character recognition; Data mining; Degradation; Feature extraction; Graphics; Hidden Markov models; Image recognition; Optical character recognition software; Text recognition; Hidden Markov Models; document image degradation model; feature extraction; text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Image Analysis & Interpretation (SSIAI), 2010 IEEE Southwest Symposium on
Conference_Location
Austin, TX
Print_ISBN
978-1-4244-7801-9
Type
conf
DOI
10.1109/SSIAI.2010.5483913
Filename
5483913
Link To Document