DocumentCode :
2814899
Title :
Web Text Categorization for Large-scale Corpus
Author :
Zhijuan Jia ; Mu, Lianbo
Author_Institution :
Sch. of Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan, China
Volume :
8
fYear :
2010
fDate :
22-24 Oct. 2010
Abstract :
Corpus is the set of language materials which are stored in computers and can use computers to search, query and analyze for enterprise decision-makers. Automated text categorization has been extensively studied and various techniques for document categorization. But based on the current scarcity of Chinese corpus, especially in the field of text categorization, the Chinese categorization corpus is especially rare; Besides, most of these experimental prototypes, for the purpose of evaluating different techniques, have been restricted to the heterogeneous, autonomic, dynamic and distributed internet environment. This paper proposes and realizes a kind of incremental learning algorithm on large-scale corpus for Chinese text categorization. In this study, an approach based on Support Vector Machines (SVMs) for web text mining of large-scale systems on GBODSS is developed to support enterprise decision making. Experimental results show that our approach has good classification accuracy by incremental learning and it shows speed up of computation time is almost super linear.
Keywords :
Internet; data mining; decision making; decision support systems; learning (artificial intelligence); natural language processing; support vector machines; text analysis; Chinese categorization corpus; GBODSS; Web text categorization; Web text mining; distributed internet environment; document categorization; enterprise decision-makers; incremental learning algorithm; large-scale corpus; large-scale systems; support vector machines; Accuracy; Education; Finance; GBODSS; Grid Technology; Large-scale Corpus; chinese text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Application and System Modeling (ICCASM), 2010 International Conference on
Conference_Location :
Taiyuan
Print_ISBN :
978-1-4244-7235-2
Electronic_ISBN :
978-1-4244-7237-6
Type :
conf
DOI :
10.1109/ICCASM.2010.5619341
Filename :
5619341
Link To Document :
بازگشت