DocumentCode :
2142900
Title :
Novel Data Representation for Text Extraction from Multispectral Historical Document Images
Author :
Hedjam, Rachid ; Cheriet, Mohamed
Author_Institution :
Synchromedia Lab. for Multemedia Commun. in Telepresence, Montreal, QC, Canada
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
172
Lastpage :
176
Abstract :
The extraction and analysis of useful information from old document images is very important into cultural heritage preservation. In advanced research, where the goal is to separate the foreground (in general, text) from the background, image restoration and pattern classification techniques are used. Most of these methods consist of classifying the pixels based on their gray-scale value. In this paper, we propose to perform foreground pattern extraction using regions-of-interest (ROI) analysis and a maximum likelihood classifier designed for multispectral document images. As contribution, a new feature vector is proposed to improve discrimination between patterns that is embedded in a simple statistical classification method. The results, which are promising, are compared to the state-of-the-art.
Keywords :
document image processing; feature extraction; image classification; maximum likelihood estimation; statistical analysis; cultural heritage preservation; data representation; feature vector; foreground pattern extraction; maximum likelihood classifier; multispectral historical document image; regions-of-interest analysis; statistical classification method; text extraction; Cultural differences; Data mining; Feature extraction; Imaging; Materials; Vectors; Degraded document images; Document image binarization; Feature vector; Historical document images; Multispectral imaging; Pattern persistence; Tensor-based energy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.43
Filename :
6065298
Link To Document :
بازگشت