DocumentCode
2417375
Title
Business form classification using strings
Author
Ting, Antoine ; Leung, Maylor K. H.
Author_Institution
Sch. of Appl. Sci., Nanyang Technol. Inst., Singapore
Volume
2
fYear
1996
fDate
25-29 Aug 1996
Firstpage
690
Abstract
Business forms are “linear” documents which can be accurately described by a one-dimensional data structure. This paper proposes a novel approach for form identification using strings. This application can be used as a basis for extension to other “linear” documents such as logos or line drawings. A set of known blank forms is stored in a database and incoming forms are automatically matched to one of these. In addition, forms which are not in the database can also be detected. A novel and simple method is used for matching by considering a distinctive “signature” for each document. This takes the shape of a string which describes the elements present on the form. Included are the location and size of lines, corners and blocks of text, quantised as discrete symbols. A specially adapted and efficient string edit distance calculation is then applied for matching. Unregistered forms can be detected by examining the unmatched elements between two strings. This novel string format makes it possible to extend the conventional one-dimensional representation possibilities of strings to a richer “one-and-a-half dimensional” structure and requires no training
Keywords
document image processing; image classification; image segmentation; visual databases; business form classification; discrete symbols; form identification; line drawings; linear documents; logos; one-dimensional data structure; string edit distance calculation; strings; Application software; Data structures; Dynamic programming; Neural networks; Pattern matching; Pattern recognition; Shape; Software systems; Spatial databases; Trademarks;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 1996., Proceedings of the 13th International Conference on
Conference_Location
Vienna
ISSN
1051-4651
Print_ISBN
0-8186-7282-X
Type
conf
DOI
10.1109/ICPR.1996.546911
Filename
546911
Link To Document