Title of article :
KHATT: An open Arabic offline handwritten text database
Author/Authors :
Mahmoud ، نويسنده , , Sabri A. and Ahmad، نويسنده , , Irfan and Al-Khatib، نويسنده , , Wasfi G. and Alshayeb، نويسنده , , Mohammad and Tanvir Parvez، نويسنده , , Mohammad and Mنrgner، نويسنده , , Volker and Fink، نويسنده , , Gernot A.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Pages :
17
From page :
1096
To page :
1112
Abstract :
A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier. tabase is made freely available to researchers world-wide for research in various handwritten-related problems such as text recognition, writer identification and verification, forms analysis, pre-processing, segmentation. Several international research groups/researchers acquired the database for use in their research so far.
Keywords :
Arabic handwritten text database , document analysis , Form processing , Arabic OCR
Journal title :
PATTERN RECOGNITION
Serial Year :
2014
Journal title :
PATTERN RECOGNITION
Record number :
1736024
Link To Document :
بازگشت