DocumentCode
1632782
Title
Enhanced Text Extraction from Arabic Degraded Document Images Using EM Algorithm
Author
Boussellaa, Wafa ; Bougacha, Aymen ; Zahour, Abderrazak ; El Abed, Haikal ; Alimi, Adel
Author_Institution
ENIS, Univ. of Sfax, Sfax, Tunisia
fYear
2009
Firstpage
743
Lastpage
747
Abstract
This paper presents a new enhanced text extraction algorithm from degraded document images on the basis of the probabilistic models. The observed document image is considered as a mixture of Gaussian densities which represents the foreground and background document image components. The EM algorithm is introduced in order to estimate and improve the parameters of the mixtures of densities recursively. The initial parameters of the EM algorithm are estimated by the k-means clustering method. After the parameter estimation, the document image is partitioned into text and background classes by the means of ML approach. The performance of the proposed approach is evaluated on a variety of degraded documents comes from the collections of the National library of Tunisia.
Keywords
Gaussian processes; document image processing; expectation-maximisation algorithm; image representation; image segmentation; maximum likelihood detection; natural language processing; parameter estimation; probability; text analysis; Arabic degraded document image; EM algorithm; Gaussian mixture; ML algorithm; National library of Tunisia; enhanced text extraction algorithm; image representation; k-means clustering method; parameter estimation; probabilistic model; Algorithm design and analysis; Clustering algorithms; Clustering methods; Degradation; Image enhancement; Image segmentation; Maximum likelihood estimation; Parameter estimation; Partitioning algorithms; Text analysis; Arabic degraded document image; Maximum likelihood algorithm(ML); expectation-maximisation algorithm (EM); k-means clustering; segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location
Barcelona
ISSN
1520-5363
Print_ISBN
978-1-4244-4500-4
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2009.220
Filename
5277497
Link To Document