Title of article :
A multi-class SVM classification system based on learning methods from indistinguishable chinese official documents
Author/Authors :
Fu، نويسنده , , JuiHsi and Lee، نويسنده , , SingLing Lee، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2012
Pages :
8
From page :
3127
To page :
3134
Abstract :
Support Vector Machines (SVM) has been developed for Chinese official document classification in One-against-All (OAA) multi-class scheme. Several data retrieving techniques including sentence segmentation, term weighting, and feature extraction are used in preprocess. We observe that most documents of which contents are indistinguishable make poor classification results. The traditional solution is to add misclassified documents to the training set in order to adjust classification rules. In this paper, indistinguishable documents are observed to be informative for strengthening prediction performance since their labels are predicted by the current model in low confidence. A general approach is proposed to utilize decision values in SVM to identify indistinguishable documents. Based on verified classification results and distinguishability of documents, four learning strategies that select certain documents to training sets are proposed to improve classification performance. Experiments report that indistinguishable documents are able to be identified in a high probability and are informative for learning strategies. Furthermore, LMID that adds both of misclassified documents and indistinguishable documents to training sets is the most effective learning strategy in SVM classification for large set of Chinese official documents in terms of computing efficiency and classification accuracy.
Keywords :
Multi-class classification , Chinese official document classification , Support vector machines (SVM) , incremental learning , Indistinguishability identification
Journal title :
Expert Systems with Applications
Serial Year :
2012
Journal title :
Expert Systems with Applications
Record number :
2351251
Link To Document :
بازگشت