Abstract :
Text mining is an important research direction, which involves several felds, such as information retrieval, information extraction,
and text categorization. In this paper, we propose an efcient multiple classifer approach to text categorization based on swarmoptimized topic modelling. Te Latent Dirichlet allocation (LDA) can overcome the high dimensionality problem of vector space
model, but identifying appropriate parameter values is critical to performance of LDA. Swarm-optimized approach estimates the
parameters of LDA, including the number of topics and all the other parameters involved in LDA. Te hybrid ensemble pruning
approach based on combined diversity measures and clustering aims to obtain a multiple classifer system with high predictive
performance and better diversity. In this scheme, four diferent diversity measures (namely, disagreement measure, Q-statistics,
the correlation coefcient, and the double fault measure) among classifers of the ensemble are combined. Based on the combined
diversity matrix, a swarm intelligence based clustering algorithm is employed to partition the classifers into a number of disjoint
groups and one classifer (with the highest predictive performance) from each cluster is selected to build the fnal multiple classifer
system. Te experimental results based on fve biomedical text benchmarks have been conducted. In the swarm-optimized LDA,
diferent metaheuristic algorithms (such as genetic algorithms, particle swarm optimization, frefy algorithm, cuckoo search
algorithm, and bat algorithm) are considered. In the ensemble pruning, fve metaheuristic clustering algorithms are evaluated.
Te experimental results on biomedical text benchmarks indicate that swarm-optimized LDA yields better predictive performance
compared to the conventional LDA. In addition, the proposed multiple classifer system outperforms the conventional classifcation
algorithms, ensemble learning, and ensemble pruning methods.