DocumentCode
2142900
Title
Novel Data Representation for Text Extraction from Multispectral Historical Document Images
Author
Hedjam, Rachid ; Cheriet, Mohamed
Author_Institution
Synchromedia Lab. for Multemedia Commun. in Telepresence, Montreal, QC, Canada
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
172
Lastpage
176
Abstract
The extraction and analysis of useful information from old document images is very important into cultural heritage preservation. In advanced research, where the goal is to separate the foreground (in general, text) from the background, image restoration and pattern classification techniques are used. Most of these methods consist of classifying the pixels based on their gray-scale value. In this paper, we propose to perform foreground pattern extraction using regions-of-interest (ROI) analysis and a maximum likelihood classifier designed for multispectral document images. As contribution, a new feature vector is proposed to improve discrimination between patterns that is embedded in a simple statistical classification method. The results, which are promising, are compared to the state-of-the-art.
Keywords
document image processing; feature extraction; image classification; maximum likelihood estimation; statistical analysis; cultural heritage preservation; data representation; feature vector; foreground pattern extraction; maximum likelihood classifier; multispectral historical document image; regions-of-interest analysis; statistical classification method; text extraction; Cultural differences; Data mining; Feature extraction; Imaging; Materials; Vectors; Degraded document images; Document image binarization; Feature vector; Historical document images; Multispectral imaging; Pattern persistence; Tensor-based energy;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location
Beijing
ISSN
1520-5363
Print_ISBN
978-1-4577-1350-7
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2011.43
Filename
6065298
Link To Document