Title :
OCR-independent and segmentation-free word-spotting in handwritten Arabic Archive documents
Author :
Aouadi, Nabil ; Kacem, Adel
Author_Institution :
LaTICE, Res. Lab. of Technol. of Inf. & Commun. & Electr. Eng., Tunis, Tunisia
Abstract :
In this paper, a word-spotting approach is presented that can help in reading handwritten Arabic Archive Documents. Because of the low quality of these documents, the proposed approach is free segmentation, independent of OCR, using a global transformation of word images. It is a based learning approach which employs Generalized Hough Transform (GHT) technique. It detects words, described by their models, in documents images by finding the model´s position in the image. With the GHT, the problem of finding the model´s position is transformed to a problem of finding the transformation´s parameter that maps the model into the image. Parameters such as Hough threshold and distance between voting points are considered for a better location and recognition of words. We tested our system on registers from the 19th century onwards, held in the National Archives of Tunisia. Our first experiments reach an average of 94% of well-spotted words.
Keywords :
Hough transforms; document image processing; handwritten character recognition; optical character recognition; GHT technique; Hough threshold parameter; OCR; distance parameter; generalized Hough transform; handwritten Arabic archive documents; optical character recognition; segmentation-free word-spotting approach; word image transformation; word location; word recognition; Dictionaries; Image segmentation; Optical character recognition software; Registers; Shape; Training; Transforms; Clustering; Generalized Hough Transform; Handwritten Recognition; Historical document; OCR; Word-spotting;
Conference_Titel :
Electrical Engineering and Software Applications (ICEESA), 2013 International Conference on
Conference_Location :
Hammamet
Print_ISBN :
978-1-4673-6302-0
DOI :
10.1109/ICEESA.2013.6578363