Title of article :
Arabic word descriptor for handwritten word indexing and lexicon reduction
Author/Authors :
Chherawala، نويسنده , , Youssouf and Cheriet، نويسنده , , Mohamed، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Pages :
10
From page :
3477
To page :
3486
Abstract :
Word recognition systems use a lexicon to guide the recognition process in order to improve the recognition rate. However, as the lexicon grows, the computation time increases. In this paper, we present the Arabic word descriptor (AWD) for Arabic word shape indexing and lexicon reduction in handwritten documents. It is formed in two stages. First, the structural descriptor (SD) is computed for each connected component (CC) of the word image. It describes the CC shape using the bag-of-words model, where each visual word represents a different local shape structure, extracted from the image with filters of different patterns and scales. Then, the AWD is formed by sorting and normalizing the SDs. This emphasizes the symbolic features of Arabic words, such as subwords and diacritics, without performing layout segmentation. In the context of lexicon reduction, the AWD is used to index a reference database. Given a query image, the reduced lexicon is obtained from the labels of the first entries in the indexed database. This framework has been tested on Arabic word databases. It has a low computational overhead, while providing a compact descriptor, with state-of-the-art results for lexicon reduction on the Ibn Sina and IFN/ENIT databases.
Keywords :
Lexicon reduction , IFN/ENIT , Ibn Sina database , Arabic handwritten documents , Holistic representation , Shape indexing , Arabic word descriptor
Journal title :
PATTERN RECOGNITION
Serial Year :
2014
Journal title :
PATTERN RECOGNITION
Record number :
1736614
Link To Document :
بازگشت