مرکز منطقه ای اطلاع رساني علوم و فناوري - Effective shrinkage of large multi-class linear svm models for text categorization

DocumentCode :

2488701

Title :

Effective shrinkage of large multi-class linear svm models for text categorization

Author :

Dong, Jianxiong ; Suen, C.Y. ; Krzyzak, Adam

Author_Institution :

Yahoo Inc, Sunnyvale, CA

fYear :

2008

fDate :

8-11 Dec. 2008

Firstpage :

Lastpage :

Abstract :

When linear support vector machines (SVMs) are applied to multi-class text categorization in industry, the size of the linear SVM model is very large, usually greater than several gigabytes. As a result, the model cannot directly fit into the computer memory and the classification process is slow. In this paper, a novel method based on vector norm is proposed to shrink the model size significantly without sacrificing the classification accuracy. Also, we propose a cache-efficient implementation of multi-class linear SVMs in the classification phase. Our experimental results have shown that on Yahoo-Korea dataset the proposed method can shrink the model size from 5.2 gigabytes to 260 megabytes and the efficient implementation of linear SVM has obtained a speedup factor of 44.

Keywords :

cache storage; classification; support vector machines; text analysis; Yahoo-Korea dataset; cache-efficient implementation; large multiclass linear SVM model; linear support vector machine; multiclass text categorization; vector norm; Decision trees; Degradation; Large-scale systems; Support vector machine classification; Support vector machines; Testing; Text categorization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Pattern Recognition, 2008. ICPR 2008. 19th International Conference on

Conference_Location :

Tampa, FL

ISSN :

1051-4651

Print_ISBN :

978-1-4244-2174-9

Electronic_ISBN :

1051-4651

Type :

conf

DOI :

10.1109/ICPR.2008.4761782

Filename :

4761782

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2488701