• DocumentCode
    3181010
  • Title

    Research on Text Classification Algorithm of Largest Dispersion Based on Term Frequency

  • Author

    Junxiu, An ; Yuchang, Jin

  • Author_Institution
    Sch. of Software Eng., Chengdu Univ. of Inf. Technol. (CUIT), Chengdu, China
  • Volume
    1
  • fYear
    2009
  • fDate
    25-27 Dec. 2009
  • Firstpage
    400
  • Lastpage
    403
  • Abstract
    In order to achieve a document in accordance with the contents of the page automatic classification, put forward the largest dispersion of text classification algorithm based on the term frequency. The algorithm using backward term frequency algorithm for the n-types typical texts confirm the scientific and effective characteristics set of n-types; rely on it, getting the classification values of Webpage documents in the n-types characteristics set through adopt to the largest dispersion algorithm, getting the largest dispersion after dispersion comparison; and then compared the largest dispersion value with relative threshold, if the value is larger than the threshold, it is the type of webpage documents, but if the value is smaller than the threshold, the judgement about the type of document is invalid. The algorithm has good robustness and easy-to-use, which is very effective for the large-scale data of small documents.
  • Keywords
    Internet; pattern classification; text analysis; Web page document; backward term frequency algorithm; largest dispersion algorithm; n-types characteristics set; page automatic classification; text classification algorithm; Artificial intelligence; Classification algorithms; Dispersion; Frequency; Large-scale systems; Robustness; Software algorithms; Testing; Text categorization; Vocabulary; retrospect term frequency algorithm; text classification algorithm; the characteristics set; the largest dispersion algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science-Technology and Applications, 2009. IFCSTA '09. International Forum on
  • Conference_Location
    Chongqing
  • Print_ISBN
    978-0-7695-3930-0
  • Electronic_ISBN
    978-1-4244-5423-5
  • Type

    conf

  • DOI
    10.1109/IFCSTA.2009.103
  • Filename
    5385048