DocumentCode :
1186293
Title :
Automatic textual document categorization based on generalized instance sets and a metamodel
Author :
Lam, Wai ; Han, Yiqiu
Author_Institution :
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Shatin, China
Volume :
25
Issue :
5
fYear :
2003
fDate :
5/1/2003 12:00:00 AM
Firstpage :
628
Lastpage :
633
Abstract :
We propose a new approach to text categorization known as generalized instance set (GIS) algorithm under the framework of generalized instance patterns. Our GIS algorithm unifies the strengths of k-NN and linear classifiers and adapts to characteristics of text categorization problems. It focuses on refining the original instances and constructs a set of generalized instances. We also propose a metamodel framework based on category feature characteristics. It has a metalearning phase which discovers a relationship between category feature characteristics and each component algorithm. Extensive experiments have been conducted on two large-scale document corpora for both GIS and the metamodel. The results demonstrate that both approaches generally achieve promising text categorization performance.
Keywords :
classification; document handling; learning by example; automatic textual document categorization; category feature characteristics; experiments; generalized instance patterns; generalized instance sets; instance-based learning; k-NN; k-nearest-neighbor; linear classifiers; metalearning; metamodel; text classification; Filtering; Geographic Information Systems; Humans; Large-scale systems; Machine learning; Management training; Pattern recognition; Routing; Systems engineering and theory; Text categorization;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2003.1195997
Filename :
1195997
Link To Document :
بازگشت