Title :
Visual appearance based document classification methods: Performance evaluation and benchmarking
Author :
Syed Saqib Bukhari;Andreas Dengel
Author_Institution :
German Research Center for Artificial Intelligence (DFKI), Germany
Abstract :
Most of the traditional document image classification techniques concentrate on document segmentation and OCR analysis, in spite of so many complexities and limitations involved. Recently, many of the document image classification problems are easily solved just by adapting standard computer vision approaches for natural image retrieval and classification, that are referred as visual appearance based document classification techniques. These approaches have reported better results as compared to the traditional approaches on proprietary datasets. However, so far these approaches are not compared with each other and, despite having potential, they are not evaluated on distorted camera-captured documents, which is one of the challenging requirements in our present commercial document analysis projects. In this paper, we present simple and effective descriptions of different visual appearance based document image classification techniques. We compare their performance on various standard and publicly available datasets, that are differ in degree of image degradations and content variations. We also demonstrate their advantages and limitations. Additionally, we make the implemented versions of these method publicly available to research community for usage and further testing on other domains.
Keywords :
"Optical character recognition software","Distortion","Algorithm design and analysis","Image segmentation","Programming","Chlorine","Classification algorithms"
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
DOI :
10.1109/ICDAR.2015.7333908