Title :
Features for word spotting in historical manuscripts
Author :
Rath, Toni M. ; Manmatha, R.
Author_Institution :
Center for Intelligent & Inf. Retrieval, Massachusetts Univ., Amherst, MA, USA
Abstract :
For the transition from traditional to digital libraries, the large number of handwritten manuscripts that exist pose a great challenge. Easy access to such collections requires an index, which is currently created manually at great cost. Because automatic handwriting recognizers fail on historical manuscripts, the word spotting technique has been developed: the words in a collection are matched as images and grouped into clusters which contain all instances of the same word. By annotating "interesting" clusters, an index that links words to the locations where they occur can be built automatically. Due to the noise in historical documents, selecting the right features for matching words is crucial. We analyzed a range of features suitable for matching words using dynamic time warping (DTW), which aligns and compares sets of features extracted from two images. Each feature\´s individual performance was measured on a test set. With an average precision of 72%, a combination of features outperforms competing techniques in speed and precision.
Keywords :
document image processing; handwritten character recognition; history; image matching; DTW; automatic handwriting recognizer; dynamic time warping; handwritten manuscript; historical manuscripts; traditional-digital library transition; word matching; word spotting technique; Character recognition; Costs; Degradation; Handwriting recognition; Image analysis; Image segmentation; Indexing; Information retrieval; Optical character recognition software; Software libraries;
Conference_Titel :
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN :
0-7695-1960-1
DOI :
10.1109/ICDAR.2003.1227662