Title :
A novel scheme for retrieval of handwritten textual annotations for information just in time (iJIT)
Author :
Basu, Subhadip ; Konishi, Kouske ; Furukawa, Naohiro ; Ikeda, Hisashi
Author_Institution :
Comput. Sci. & Eng. Dept., Jadavpur Univ., Kolkata
Abstract :
We have designed a novel query retrieval scheme for the information just in time (iJIT) system to retrieve handwritten annotations from digital documents based on typed/handwritten query. The two key components of the developed query retrieval system (QRS) are the character recognition engine and the query retrieval engine. The character recognition engine uses Tesseract 2.01 open source Optical Character Recognition (OCR) Engine under Apache License 2.0 and is trained with handwritten samples from different users. The character recognition engine receives real-time digital pen generated data, and produces segmented-recognition result. The query retrieval engine, resolves the index / query requests from the users for possible information update / retrieval. In case of a handwritten query, the query retrieval engine interacts with the recognition engine to create / update the inverted index table with recognized word labels with annotation indices. In the case of typed text query, the inverted index table is searched directly to retrieve the best matches of annotation indices using a q-gram based approximate string matching technique. A HMM - Viterbi algorithm is finally implemented to find the optimum recognized character sequence in each word using a fuzzy character confusion matrix.
Keywords :
document image processing; fuzzy set theory; handwritten character recognition; image retrieval; image segmentation; optical character recognition; Apache License 2.0; character recognition engine; digital document; fuzzy membership function; handwritten textual annotation retrieval; information just-in-time; open source optical character recognition; query retrieval engine; Character recognition; Computer science; Content based retrieval; Engines; Handwriting recognition; Hidden Markov models; Indexing; Information analysis; Information retrieval; Optical character recognition software;
Conference_Titel :
TENCON 2008 - 2008 IEEE Region 10 Conference
Conference_Location :
Hyderabad
Print_ISBN :
978-1-4244-2408-5
Electronic_ISBN :
978-1-4244-2409-2
DOI :
10.1109/TENCON.2008.4766776