DocumentCode :
3136279
Title :
Layout Analysis for Arabic Historical Document Images Using Machine Learning
Author :
Bukhari, Syed Saqib ; Breuel, Thomas M. ; Asi, Abedelkadir ; El-Sana, Jihad
Author_Institution :
Tech. Univ. of Kaiserslautern, Kaiserslautern, Germany
fYear :
2012
fDate :
18-20 Sept. 2012
Firstpage :
639
Lastpage :
644
Abstract :
Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.
Keywords :
document image processing; feature extraction; image classification; image segmentation; learning (artificial intelligence); natural languages; text analysis; Arabic historical document images; block segmentation; complex layout format; complex side-notes layout formats; connected components classification; connected-component level; discriminative feature extraction; document image understanding system; machine learning; manuscripts; multilayer perception classifier; page layout analysis; page margins; pixel level analysis; robust feature vectors generation; state-of-the-art segmentation approach; text class; text segments; voting scheme; Accuracy; Context; Feature extraction; Image segmentation; Layout; Shape; Training; historical manuscripts; layout analysis; machine learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
Conference_Location :
Bari
Print_ISBN :
978-1-4673-2262-1
Type :
conf
DOI :
10.1109/ICFHR.2012.227
Filename :
6424468
Link To Document :
بازگشت