Title :
A tool for classifying office documents
Author :
Hao, Xiaolong ; Wang, Jason T L ; Bieber, Michael P. ; Ng, Peter A.
Author_Institution :
Dept. of Comput. & Inf. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
Abstract :
The authors present the design of a tool for classifying office documents. They represent a document´s layout structure using an ordered labeled tree, called the layout structure tree (L-S-tree), based on a nested segmentation procedure. The tool uses a sample-based approach for learning, where concepts are learned by retaining samples and new documents are classified by matching their L-S-trees with samples. The matching process involves both computing the edit distance between two trees using a previously developed pattern matching toolkit, and calculating the degree of conceptual closeness between the documents and samples. The experimental results show that the tool is capable of classifying various types of office documents, even with very few samples in the sample base
Keywords :
deductive databases; document handling; learning (artificial intelligence); office automation; pattern classification; pattern matching; tree data structures; L-S-tree; conceptual closeness; edit distance; layout structure tree; learning; nested segmentation procedure; office document classification; ordered labeled tree; pattern matching toolkit; sample-based approach; Classification tree analysis; Facsimile; Image converters; Information science; Pattern matching; Surges; Testing; Text recognition; Tree data structures;
Conference_Titel :
Tools with Artificial Intelligence, 1993. TAI '93. Proceedings., Fifth International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
0-8186-4200-9
DOI :
10.1109/TAI.1993.633991