Title :
Research on Text Classification Algorithm by Combining Statistical and Ontology Methods
Author :
Wu, Guoshi ; Liu, Kaiping
Author_Institution :
Sch. of Software Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
Traditional statistics based text classification methods almost construct their characteristic vectors with some key terms, and they consider terms are independent of each other and there are no semantic relations among them. However, in the real world, words used to have semantic relationships, such as synonym, hyponymy and so on. Therefore, classification methods based on statistics do not conform to the fact and the classification results also do not satisfying. To draw this problem, there is a need to obtain characteristic semantic information by taking advantage of ontology. With the help of the features of ontology class hierarchical structure and property constraint, one can match the terms with domain ontology concepts and build up the concept vector space model. Using ontology method for text classification alone will lack scientific and stringency of the statistics. Taking all the above into consideration, this paper takes a combination of the two classification methods. Firstly, we choose the characteristics with statistics method and based on this, add in the ontology and form the concept vector space. Besides, we improve the KNN algorithm from two aspects. Finally, we implement a module for text classification of telecom domain. In the end, we make an analysis and comparison of the results of both statistics-only based (without improving the KNN algorithm) and the combination of two classification methods (with improved KNN).
Keywords :
ontologies (artificial intelligence); statistical analysis; text analysis; KNN algorithm; concept vector space model; ontology methods; statistical methods; text classification algorithm; Classification algorithms; Niobium; Ontologies; Probability; Software engineering; Statistics; Support vector machine classification; Support vector machines; Telecommunications; Text categorization;
Conference_Titel :
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4507-3
Electronic_ISBN :
978-1-4244-4507-3
DOI :
10.1109/CISE.2009.5363406