Title :
Searching Off-line Arabic Documents
Author :
Chan, Jim ; Ziftci, Celal ; Forsyth, David
Author_Institution :
University of Illinois, Urbana
Abstract :
Currently an abundance of historical manuscripts, journals, and scientific notes remain largely unaccessible in library archives. Manual transcription and publication of such documents is unlikely, and automatic transcription with high enough accuracy to support a traditional text search is difficult. In this work we describe a lexicon-free system for performing text queries on off-line printed and handwritten Arabic documents. Our segmentation-based approach utilizes gHMMs with a bigram letter transition model, and KPCA/LDA for letter discrimination. The segmentation stage is integrated with inference. We show that our method is robust to varying letter forms, ligatures, and overlaps. Additionally, we find that ignoring letters beyond the adjoining neighbors has little effect on inference and localization, which leads to a significant performance increase over standard dynamic programming. Finally, we discuss an extension to perform batch searches of large word lists for indexing purposes.
Keywords :
Computer vision; Dynamic programming; Handwriting recognition; Indexing; Learning systems; Libraries; Linear discriminant analysis; Robustness; Training data; Writing;
Conference_Titel :
Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
Print_ISBN :
0-7695-2597-0
DOI :
10.1109/CVPR.2006.269