Extracting statistical data from free-form text

Author

Hill, L. Owen ; Zein, David A.

Author_Institution

IBM Corp., East Fishkill, NY, USA

Volume

2

Issue

3

fYear

1986

fDate

5/1/1986 12:00:00 AM

Firstpage

18

Lastpage

24

Abstract

The authors describe a method for processing free-form text files. The method consists of segregating and separating four physically and logically identifiable regions. The four regions are postprocessed to update three history files that contain information about manufactured products over a period of time. The technique used in processing such files falls under the general category of data segregation and character recognition. It involves the use of logical and mathematical operations in recognizing region boundaries and types of data fields and establishing uniqueness in name recognition. Hashing methods are used, combined with logical matrix multiplication in updating the history files. Sparse formats are used to store multiple large arrays on disks, reducing storage requirements by more than a factor of two. The techniques are implemented using multiprogramming environments in an automated system.

Keywords

data handling; manufacturing data processing; statistics; word processing; character recognition; data extraction; data fields; data segregation; free-form text; hashing; history files; logical operations; manufactured products; mathematical operations; matrix multiplication; multiple large arrays; multiprogramming environments; name recognition; region boundaries; sparse formats; statistical data; storage requirements; uniqueness; Arrays; Data mining; Graphics; History; Logic arrays; Matrix converters; Vectors;

fLanguage

English

Journal_Title

Circuits and Devices Magazine, IEEE

Publisher

ieee

ISSN

8755-3996

Type

jour

DOI

10.1109/MCD.1986.6311822

Filename

6311822