DocumentCode :
2199324
Title :
A Full-Text Search System for Images of Hand-Written Cursive Documents
Author :
Imura, Hajime ; Tanaka, Yuzuru
Author_Institution :
Dept. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo, Japan
fYear :
2010
fDate :
16-18 Nov. 2010
Firstpage :
640
Lastpage :
645
Abstract :
We propose a full-text search technique for image-scanned documents that does not recognize individual characters. The system is as fast as a full-text search of machine-readable documents. Such a system is important when working with historical handwritten manuscripts. The proposed method works independently of differences in language and font because it uses a new pseudo-coding scheme based on the statistical features of character shapes. We evaluated our method in recall-precision curves for n-gram-based query strings in Japanese manuscripts and word-based query strings in English manuscripts using two types of image features and two different pseudo-coding schemes. Results demonstrate that the precision reached over 50% at a recall point of 80% for 3-gram queries in the Japanese manuscripts. Results also indicate that our pseudo-code is suitable for applications that use machine-learning techniques. The combination of an HMM-based filtering method and our pseudo-code can significantly improve performance in terms of retrieval precision.
Keywords :
document image processing; feature extraction; handwritten character recognition; hidden Markov models; image retrieval; learning (artificial intelligence); natural languages; statistical analysis; text analysis; word processing; English manuscript; HMM-based filtering; Japanese manuscript; character shape; full text search system; hand-written cursive document image; handwritten manuscript; image features; image scanned document; machine learning technique; machine readable document; n gram-based query string; pseudocoding scheme; recall precision curve; statistical feature; word-based query string; Full-text Search; Performance Evaluation; Word Spotting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on
Conference_Location :
Kolkata
Print_ISBN :
978-1-4244-8353-2
Type :
conf
DOI :
10.1109/ICFHR.2010.105
Filename :
5693636
Link To Document :
بازگشت