DocumentCode
1291658
Title
Extracting statistical data from free-form text
Author
Hill, L. Owen ; Zein, David A.
Author_Institution
IBM Corp., East Fishkill, NY, USA
Volume
2
Issue
3
fYear
1986
fDate
5/1/1986 12:00:00 AM
Firstpage
18
Lastpage
24
Abstract
The authors describe a method for processing free-form text files. The method consists of segregating and separating four physically and logically identifiable regions. The four regions are postprocessed to update three history files that contain information about manufactured products over a period of time. The technique used in processing such files falls under the general category of data segregation and character recognition. It involves the use of logical and mathematical operations in recognizing region boundaries and types of data fields and establishing uniqueness in name recognition. Hashing methods are used, combined with logical matrix multiplication in updating the history files. Sparse formats are used to store multiple large arrays on disks, reducing storage requirements by more than a factor of two. The techniques are implemented using multiprogramming environments in an automated system.
Keywords
data handling; manufacturing data processing; statistics; word processing; character recognition; data extraction; data fields; data segregation; free-form text; hashing; history files; logical operations; manufactured products; mathematical operations; matrix multiplication; multiple large arrays; multiprogramming environments; name recognition; region boundaries; sparse formats; statistical data; storage requirements; uniqueness; Arrays; Data mining; Graphics; History; Logic arrays; Matrix converters; Vectors;
fLanguage
English
Journal_Title
Circuits and Devices Magazine, IEEE
Publisher
ieee
ISSN
8755-3996
Type
jour
DOI
10.1109/MCD.1986.6311822
Filename
6311822
Link To Document