DocumentCode
2818041
Title
Research on Text Classification Algorithm by Combining Statistical and Ontology Methods
Author
Wu, Guoshi ; Liu, Kaiping
Author_Institution
Sch. of Software Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
fYear
2009
fDate
11-13 Dec. 2009
Firstpage
1
Lastpage
4
Abstract
Traditional statistics based text classification methods almost construct their characteristic vectors with some key terms, and they consider terms are independent of each other and there are no semantic relations among them. However, in the real world, words used to have semantic relationships, such as synonym, hyponymy and so on. Therefore, classification methods based on statistics do not conform to the fact and the classification results also do not satisfying. To draw this problem, there is a need to obtain characteristic semantic information by taking advantage of ontology. With the help of the features of ontology class hierarchical structure and property constraint, one can match the terms with domain ontology concepts and build up the concept vector space model. Using ontology method for text classification alone will lack scientific and stringency of the statistics. Taking all the above into consideration, this paper takes a combination of the two classification methods. Firstly, we choose the characteristics with statistics method and based on this, add in the ontology and form the concept vector space. Besides, we improve the KNN algorithm from two aspects. Finally, we implement a module for text classification of telecom domain. In the end, we make an analysis and comparison of the results of both statistics-only based (without improving the KNN algorithm) and the combination of two classification methods (with improved KNN).
Keywords
ontologies (artificial intelligence); statistical analysis; text analysis; KNN algorithm; concept vector space model; ontology methods; statistical methods; text classification algorithm; Classification algorithms; Niobium; Ontologies; Probability; Software engineering; Statistics; Support vector machine classification; Support vector machines; Telecommunications; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-4507-3
Electronic_ISBN
978-1-4244-4507-3
Type
conf
DOI
10.1109/CISE.2009.5363406
Filename
5363406
Link To Document