• DocumentCode
    2818041
  • Title

    Research on Text Classification Algorithm by Combining Statistical and Ontology Methods

  • Author

    Wu, Guoshi ; Liu, Kaiping

  • Author_Institution
    Sch. of Software Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
  • fYear
    2009
  • fDate
    11-13 Dec. 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Traditional statistics based text classification methods almost construct their characteristic vectors with some key terms, and they consider terms are independent of each other and there are no semantic relations among them. However, in the real world, words used to have semantic relationships, such as synonym, hyponymy and so on. Therefore, classification methods based on statistics do not conform to the fact and the classification results also do not satisfying. To draw this problem, there is a need to obtain characteristic semantic information by taking advantage of ontology. With the help of the features of ontology class hierarchical structure and property constraint, one can match the terms with domain ontology concepts and build up the concept vector space model. Using ontology method for text classification alone will lack scientific and stringency of the statistics. Taking all the above into consideration, this paper takes a combination of the two classification methods. Firstly, we choose the characteristics with statistics method and based on this, add in the ontology and form the concept vector space. Besides, we improve the KNN algorithm from two aspects. Finally, we implement a module for text classification of telecom domain. In the end, we make an analysis and comparison of the results of both statistics-only based (without improving the KNN algorithm) and the combination of two classification methods (with improved KNN).
  • Keywords
    ontologies (artificial intelligence); statistical analysis; text analysis; KNN algorithm; concept vector space model; ontology methods; statistical methods; text classification algorithm; Classification algorithms; Niobium; Ontologies; Probability; Software engineering; Statistics; Support vector machine classification; Support vector machines; Telecommunications; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-4507-3
  • Electronic_ISBN
    978-1-4244-4507-3
  • Type

    conf

  • DOI
    10.1109/CISE.2009.5363406
  • Filename
    5363406