Title :
Hierarchical clustering model for pixel-based classification of document images
Author :
Rémi Vieux;Jean-Philippe Domenger
Author_Institution :
Univ. Bordeaux, LaBRI, UMR5800, F-33400 Talence, France
Abstract :
We propose a method to learn and classify pixels in document images, e.g., to separate text from illustrations or other predefined classes. We extract texture information using a bank of Gabor filters, and learn a hierarchical clustering model that can be used as a K-Nearest Neighbours (KNN) classifier. The model has advantages over other local document image classification methods, making it efficient for real industrial applications: we do not rely on the accuracy of preprocessing steps such as binarisation or segmentation, the model can be efficiently trained using zone level annotations and it seamlessly supports multi-class classification. We demonstrate the performance of the method on a public dataset containing complex documents from magazines and technical journals.
Keywords :
"Training","Image segmentation","Accuracy","Vectors","Data models","Optical character recognition software","Graphics"
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Print_ISBN :
978-1-4673-2216-4