DocumentCode :
2068984
Title :
An Automatic Method to Extract Data from an Electronic Contract Composed of a Number of Documents in PDF Format
Author :
Kwok, Thomas ; Nguyen, Thao
Author_Institution :
IBM Res. Div., Thomas J. Watson Res. Center, Hawthorne, NY
fYear :
2006
fDate :
26-29 June 2006
Firstpage :
33
Lastpage :
33
Abstract :
An electronic contract can encompass a large number of collateral contract documents in PDF format. These contract documents are of different contract document types and converted from different original formats. Data extraction and thus data mining for this kind of electronic contracts is very difficult. In this paper, we present a novel method to automatically extract contract data from this kind of electronic contracts. Our automatic electronic contract data extraction system comprises an administrator module, a PDF parser, a pattern recognition engine and a contract data extraction engine. The administrator module provides templates for inputting document patterns and a list of contract data tags for each contract document type. It also constructs the pattern matrices and stores them in a database. The PDF parser converts the contract PDF document into the contract text document with the insertion of formatting bookmarks, such as a new page, paragraph or line. The pattern recognition engine determines a list of contract document types in the electronic contract by comparing and matching the patterns of all known contract document types with the pattern of the contract text document. The contract data extraction engine retrieves the corresponding list of contract data tags and then extracts contract data accordingly for each contract document type on the list. Our automatic electronic contract data extraction system has found to be very accurate, efficient and useful in extracting contract data for data mining
Keywords :
contracts; data mining; document handling; electronic commerce; pattern recognition; PDF format; PDF parser; administrator module; contract data extraction engine; contract data tag; contract document type; data mining; electronic contract; pattern matching; pattern recognition engine; Business; Companies; Contracts; Data mining; Databases; Information retrieval; Matrix converters; Pattern matching; Pattern recognition; Search engines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
E-Commerce Technology, 2006. The 8th IEEE International Conference on and Enterprise Computing, E-Commerce, and E-Services, The 3rd IEEE International Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
0-7695-2511-3
Type :
conf
DOI :
10.1109/CEC-EEE.2006.13
Filename :
1640288
Link To Document :
بازگشت