DocumentCode :
3582722
Title :
Subsequence kernels-based Arabic text classification
Author :
Nehar, Attia ; Benmessaoud, Abdelkader ; Cherroun, Hadda ; Ziadi, Djelloul
Author_Institution :
Lab. d´Inf. et Math., Univ. Amar Telidji, Laghouat, Algeria
fYear :
2014
Firstpage :
206
Lastpage :
213
Abstract :
Kernel methods have known huge success in machine learning. This success is mainly due to their flexibility to deal with high dimensionality of the feature space of complex data such as graphs, trees or textual data. In the field of text classification (TC) their performances have supplanted traditional algorithms. For textual data, different kernels were introduced (P-spectrum, All-Sub-sequences, Gap-Weighted Subsequences kernel, ...) to improve the performance of TC systems. In this paper, we carried out a system for Arabic TC which supports aspects of order and co-occurrence of words within a text. Transducers, specific automata, are used to represent documents. Such representation allows an efficient implementation of subsequence kernel. An empirical study is conducted to evaluate the ATC system on the large SPA corpus. Results show an improvement of the classification in terms of precision.
Keywords :
automata theory; classification; learning (artificial intelligence); natural language processing; text analysis; Arabic text classification; P-spectrum; all-subsequences kernel; automata; gap-weighted subsequences kernel; machine learning; transducer; Economics; Kernel; Transducers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on
Type :
conf
DOI :
10.1109/AICCSA.2014.7073200
Filename :
7073200
Link To Document :
بازگشت