Title :
Business form classification using strings
Author :
Ting, Antoine ; Leung, Maylor K. H.
Author_Institution :
Sch. of Appl. Sci., Nanyang Technol. Inst., Singapore
Abstract :
Business forms are “linear” documents which can be accurately described by a one-dimensional data structure. This paper proposes a novel approach for form identification using strings. This application can be used as a basis for extension to other “linear” documents such as logos or line drawings. A set of known blank forms is stored in a database and incoming forms are automatically matched to one of these. In addition, forms which are not in the database can also be detected. A novel and simple method is used for matching by considering a distinctive “signature” for each document. This takes the shape of a string which describes the elements present on the form. Included are the location and size of lines, corners and blocks of text, quantised as discrete symbols. A specially adapted and efficient string edit distance calculation is then applied for matching. Unregistered forms can be detected by examining the unmatched elements between two strings. This novel string format makes it possible to extend the conventional one-dimensional representation possibilities of strings to a richer “one-and-a-half dimensional” structure and requires no training
Keywords :
document image processing; image classification; image segmentation; visual databases; business form classification; discrete symbols; form identification; line drawings; linear documents; logos; one-dimensional data structure; string edit distance calculation; strings; Application software; Data structures; Dynamic programming; Neural networks; Pattern matching; Pattern recognition; Shape; Software systems; Spatial databases; Trademarks;
Conference_Titel :
Pattern Recognition, 1996., Proceedings of the 13th International Conference on
Conference_Location :
Vienna
Print_ISBN :
0-8186-7282-X
DOI :
10.1109/ICPR.1996.546911