Title :
Morphological tagging approach in document analysis of invoices
Author :
Belaïd, Y. ; Belaïd, A.
Author_Institution :
LORIA, Nancy II Univ., France
Abstract :
A morphological tagging approach for document image invoice analysis is described. Tokens close by their morphology and confirmed in their location within different similar contexts make apparent some parts of speech representative of the structure elements. This bottom up approach avoids the use of an priori knowledge provided that there are redundant and frequent contexts in the text. The approach is applied on the invoice body text roughly recognized by OCR and automatically segmented. The method makes possible the detection of the invoice articles and their different fields. The regularity of the article composition and its redundancy in the invoice is a good help for its structure. The recognition rate of 276 invoices and 1704 articles, is over than 91.02% for articles and 92.56% for fields.
Keywords :
document image processing; image segmentation; invoicing; optical character recognition; text analysis; OCR; article composition regularity; document image invoice analysis; frequent context; invoice articles detection; invoice body text; morphological tagging approach; parts of speech; redundant context; structure elements; Image analysis; Morphology; Natural languages; Optical character recognition software; Pattern recognition; Redundancy; Speech; Tagging; Text analysis; Text recognition;
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
Print_ISBN :
0-7695-2128-2
DOI :
10.1109/ICPR.2004.1334166