• DocumentCode
    2013500
  • Title

    Using boosting mechanism to refine the threshold of VSM-based similarity in text classification

  • Author

    Diao, LiLi ; Hu, Keyun ; Lu, Yuchang ; Shi, Chunyi

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    3
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    2284
  • Abstract
    The vector space model (VSM)-based similarity classifier is the simplest text categorization method. It has a high classification speed, but with low accuracy. The main reason is that the similarity threshold used by the similarity classifier is decided empirically, but not mathematically. This paper introduces a boosting-based mechanism to adaptively compute out relatively accurate similarity threshold over specific dataset. This method constructs better similarity-based classification rules by combining the similarity thresholds generated by the constituent classifiers of boosting. It greedily minimizes the error rates on training documents; therefore the similarity classifier with thus computed threshold should also have low error rates.
  • Keywords
    category theory; information retrieval; learning (artificial intelligence); learning systems; pattern classification; boosting; error rates; machine learning; pattern classification; similarity threshold; text categorization; vector space model; Automation; Boosting; Computer science; Error analysis; Intelligent control; Intelligent systems; Laboratories; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Control and Automation, 2002. Proceedings of the 4th World Congress on
  • Print_ISBN
    0-7803-7268-9
  • Type

    conf

  • DOI
    10.1109/WCICA.2002.1021496
  • Filename
    1021496