Title of article :
Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling
Author/Authors :
Onan, AytuL Celal Bayar University - Department of Sofware Engineering - Turgutlu - Manisa, Turkey
Pages :
22
From page :
1
To page :
22
Abstract :
Text mining is an important research direction, which involves several felds, such as information retrieval, information extraction, and text categorization. In this paper, we propose an efcient multiple classifer approach to text categorization based on swarmoptimized topic modelling. Te Latent Dirichlet allocation (LDA) can overcome the high dimensionality problem of vector space model, but identifying appropriate parameter values is critical to performance of LDA. Swarm-optimized approach estimates the parameters of LDA, including the number of topics and all the other parameters involved in LDA. Te hybrid ensemble pruning approach based on combined diversity measures and clustering aims to obtain a multiple classifer system with high predictive performance and better diversity. In this scheme, four diferent diversity measures (namely, disagreement measure, Q-statistics, the correlation coefcient, and the double fault measure) among classifers of the ensemble are combined. Based on the combined diversity matrix, a swarm intelligence based clustering algorithm is employed to partition the classifers into a number of disjoint groups and one classifer (with the highest predictive performance) from each cluster is selected to build the fnal multiple classifer system. Te experimental results based on fve biomedical text benchmarks have been conducted. In the swarm-optimized LDA, diferent metaheuristic algorithms (such as genetic algorithms, particle swarm optimization, frefy algorithm, cuckoo search algorithm, and bat algorithm) are considered. In the ensemble pruning, fve metaheuristic clustering algorithms are evaluated. Te experimental results on biomedical text benchmarks indicate that swarm-optimized LDA yields better predictive performance compared to the conventional LDA. In addition, the proposed multiple classifer system outperforms the conventional classifcation algorithms, ensemble learning, and ensemble pruning methods.
Keywords :
Text , LDA , Biomedical
Journal title :
Computational and Mathematical Methods in Medicine
Serial Year :
2018
Full Text URL :
Record number :
2610522
Link To Document :
بازگشت