• DocumentCode
    1186293
  • Title

    Automatic textual document categorization based on generalized instance sets and a metamodel

  • Author

    Lam, Wai ; Han, Yiqiu

  • Author_Institution
    Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Shatin, China
  • Volume
    25
  • Issue
    5
  • fYear
    2003
  • fDate
    5/1/2003 12:00:00 AM
  • Firstpage
    628
  • Lastpage
    633
  • Abstract
    We propose a new approach to text categorization known as generalized instance set (GIS) algorithm under the framework of generalized instance patterns. Our GIS algorithm unifies the strengths of k-NN and linear classifiers and adapts to characteristics of text categorization problems. It focuses on refining the original instances and constructs a set of generalized instances. We also propose a metamodel framework based on category feature characteristics. It has a metalearning phase which discovers a relationship between category feature characteristics and each component algorithm. Extensive experiments have been conducted on two large-scale document corpora for both GIS and the metamodel. The results demonstrate that both approaches generally achieve promising text categorization performance.
  • Keywords
    classification; document handling; learning by example; automatic textual document categorization; category feature characteristics; experiments; generalized instance patterns; generalized instance sets; instance-based learning; k-NN; k-nearest-neighbor; linear classifiers; metalearning; metamodel; text classification; Filtering; Geographic Information Systems; Humans; Large-scale systems; Machine learning; Management training; Pattern recognition; Routing; Systems engineering and theory; Text categorization;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2003.1195997
  • Filename
    1195997