DocumentCode
1186293
Title
Automatic textual document categorization based on generalized instance sets and a metamodel
Author
Lam, Wai ; Han, Yiqiu
Author_Institution
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Shatin, China
Volume
25
Issue
5
fYear
2003
fDate
5/1/2003 12:00:00 AM
Firstpage
628
Lastpage
633
Abstract
We propose a new approach to text categorization known as generalized instance set (GIS) algorithm under the framework of generalized instance patterns. Our GIS algorithm unifies the strengths of k-NN and linear classifiers and adapts to characteristics of text categorization problems. It focuses on refining the original instances and constructs a set of generalized instances. We also propose a metamodel framework based on category feature characteristics. It has a metalearning phase which discovers a relationship between category feature characteristics and each component algorithm. Extensive experiments have been conducted on two large-scale document corpora for both GIS and the metamodel. The results demonstrate that both approaches generally achieve promising text categorization performance.
Keywords
classification; document handling; learning by example; automatic textual document categorization; category feature characteristics; experiments; generalized instance patterns; generalized instance sets; instance-based learning; k-NN; k-nearest-neighbor; linear classifiers; metalearning; metamodel; text classification; Filtering; Geographic Information Systems; Humans; Large-scale systems; Machine learning; Management training; Pattern recognition; Routing; Systems engineering and theory; Text categorization;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2003.1195997
Filename
1195997
Link To Document