DocumentCode :
2480029
Title :
Random Subspace Method in Text Categorization
Author :
Gangeh, Mehrdad J. ; Kamel, Mohamed S. ; Duin, Robert P W
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
fYear :
2010
fDate :
23-26 Aug. 2010
Firstpage :
2049
Lastpage :
2052
Abstract :
In text categorization (TC), which is a supervised technique, a feature vector of terms or phrases is usually used to represent the documents. Due to the huge number of terms in even a moderate-size text corpus, high dimensional feature space is an intrinsic problem in TC. Random subspace method (RSM), a technique that divides the feature space to smaller ones each submitted to a (base) classifier (BC) in an ensemble, can be an effective approach to reduce the dimensionality of the feature space. Inspired by a similar research on functional magnetic resonance imaging (fMRI) of brain, here we address the estimation of ensemble parameters, i.e., the ensemble size (L) and the dimensionality of feature subsets (M) by defining three criteria: usability, coverage, and diversity of the ensemble. We will show that relatively medium M and small L yield an ensemble that improves the performance of a single support vector machine, which is considered as the state-of-the-art in TC.
Keywords :
estimation theory; pattern classification; random processes; support vector machines; text analysis; base classifier; document representation; feature subsets dimensionality; feature vector; functional magnetic resonance imaging; high dimensional feature space; moderate size text corpus; random subspace method; support vector machine; text categorization; Accuracy; Brain; Kernel; Machine learning; Presses; Support vector machines; Text categorization; ensemble of classifiers; random subspace method; support vector machine; text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
ISSN :
1051-4651
Print_ISBN :
978-1-4244-7542-1
Type :
conf
DOI :
10.1109/ICPR.2010.505
Filename :
5595913
Link To Document :
بازگشت