DocumentCode :
2085702
Title :
Searching Off-line Arabic Documents
Author :
Chan, Jim ; Ziftci, Celal ; Forsyth, David
Author_Institution :
University of Illinois, Urbana
Volume :
2
fYear :
2006
fDate :
2006
Firstpage :
1455
Lastpage :
1462
Abstract :
Currently an abundance of historical manuscripts, journals, and scientific notes remain largely unaccessible in library archives. Manual transcription and publication of such documents is unlikely, and automatic transcription with high enough accuracy to support a traditional text search is difficult. In this work we describe a lexicon-free system for performing text queries on off-line printed and handwritten Arabic documents. Our segmentation-based approach utilizes gHMMs with a bigram letter transition model, and KPCA/LDA for letter discrimination. The segmentation stage is integrated with inference. We show that our method is robust to varying letter forms, ligatures, and overlaps. Additionally, we find that ignoring letters beyond the adjoining neighbors has little effect on inference and localization, which leads to a significant performance increase over standard dynamic programming. Finally, we discuss an extension to perform batch searches of large word lists for indexing purposes.
Keywords :
Computer vision; Dynamic programming; Handwriting recognition; Indexing; Learning systems; Libraries; Linear discriminant analysis; Robustness; Training data; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
ISSN :
1063-6919
Print_ISBN :
0-7695-2597-0
Type :
conf
DOI :
10.1109/CVPR.2006.269
Filename :
1640928
Link To Document :
بازگشت