Title :
A New Classification Algorithm for Large Scale of Chinese Texts
Author :
Wang, Hongwei ; Wang, Jianhui ; Yi, Lei
Author_Institution :
Sch. of Econ. & Manage., Tongji Univ., Shanghai
Abstract :
Most of classifying methods are based on VSM in the current classification research, of which the widely-used method is kNN. But most of them are highly complicated on computation, and could hardly be used for classifying a large number of samples. Moreover, to them, the classifier must be rebuilt when adding or deleting the training samples, which make them poor in scalability. In this paper, two new concepts, mutual dependence and equivalent radius, are presented, based on which a new classifying method (called MDER) is offered. MDER can be used to classify a large number of samples and has good scalability. After a series of experiments of classifying Chinese documents, the conclusion are drawn that MDER outperforms kNN and CCC method, and can be used online to classify a large number of samples while keeping higher precision and recall
Keywords :
classification; natural languages; text analysis; Chinese documents; Chinese texts; classification algorithm; mutual dependence-equivalent radius; Boosting; Classification algorithms; Dictionaries; Large-scale systems; Project management; Scalability; Support vector machine classification; Support vector machines; Technology management; Text categorization; Classification; Equivalent Radius; MDER; Mutual Dependence; VSM;
Conference_Titel :
Service Operations and Logistics, and Informatics, 2006. SOLI '06. IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
1-4244-0317-0
Electronic_ISBN :
1-4244-0318-9
DOI :
10.1109/SOLI.2006.328897