Title :
Using Kernel Density Classifier with Topic Model and Cost Sensitive Learning for Automatic Text Categorization
Author :
Mansjur, Dwi Sianto ; Wada, Ted S. ; Juang, Biing Hwang
Author_Institution :
Center for Signal & Image Process., Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
This paper proposes a novel framework for automatic text categorization problem based on the kernel density classifier. The overall goal is to tackle two main issues in automatic text categorization problems: the interpretability and the performance. Specifically, to solve the interpretability issue, the latent semantic analysis technique is used to construct a topic space, in which each dimension represents a single topic. The text features are extracted directly from this topic space. To solve the performance issue, classifierspsila parameters are optimized for either cost-sensitive or non-cost-sensitive categorization. We have experimentally evaluated the proposed framework by using a corpus of twenty newsgroups. The experimental results confirm the effectiveness of the framework to utilize the features from the topic model for cost-sensitive categorization.
Keywords :
feature extraction; learning (artificial intelligence); pattern classification; text analysis; automatic text categorization; cost sensitive learning; kernel density classifier; latent semantic analysis technique; text features extraction; topic model; Classification tree analysis; Context modeling; Cost function; Distribution functions; Kernel; Niobium; Sparse matrices; Support vector machine classification; Support vector machines; Text categorization; Cost-Sensitive Learning; Latent Semantic Analysis; Minimum Classification Error; Text Categorization;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.145