Title :
A new hybrid binarization method based on Kmeans
Author :
Soua, Mahmoud ; Kachouri, Rostom ; Akil, Mohamed
Author_Institution :
LIGM, ESIEE Paris, Noisy-Le-Grand, France
Abstract :
The document binarization is a fundamental processing step toward Optical Character Recognition (OCR). It aims to separate the foreground text from the document background. In this article, we propose a novel binarization technique combining local and global approaches using the clustering algorithm Kmeans. The proposed Hybrid Binarization, based on Kmeans (HBK), performs a robust binarization on scanned documents. According to several experiments, we demonstrate that the HBK method improves the binarization quality while minimizing the amount of distortion. Moreover, it outperforms several well-known state of the art methods in the OCR evaluation.
Keywords :
document image processing; learning (artificial intelligence); optical character recognition; pattern clustering; HBK method; Kmeans clustering algorithm; OCR evaluation; binarization quality; binarization technique; distortion amount; document binarization; hybrid binarization method; optical character recognition; scanned documents; Character recognition; Clustering algorithms; Distortion measurement; Histograms; Optical character recognition software; Optical distortion; Robustness; Kmeans; OCR; Scanned documents; binarization;
Conference_Titel :
Communications, Control and Signal Processing (ISCCSP), 2014 6th International Symposium on
Conference_Location :
Athens
DOI :
10.1109/ISCCSP.2014.6877830