DocumentCode :
668094
Title :
GPU-NB: A Fast CUDA-Based Implementation of Naïve Bayes
Author :
Viegas, Felipe ; Andrade, G. ; Almeida, Jorge ; Ferreira, Ricardo ; Goncalves, Monira ; Ramos, Gustavo ; Rocha, Leonardo
Author_Institution :
Dept. of Comput. Sci., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
fYear :
2013
fDate :
23-26 Oct. 2013
Firstpage :
168
Lastpage :
175
Abstract :
The advent of the Web 2.0 has given rise to an interesting phenomenon: there is currently much more data than what can be effectively analyzed without relying on sophisticated automatic tools. Some of these tools, which target the organization and extraction of useful knowledge from this huge amount of data, rely on machine learning and data or text mining techniques, specifically automatic document classification algorithms. However, these algorithms are still a computational challenge because of the volume of data that needs to be processed. Some of the strategies available to address this challenge are based on the parallelization of ADC algorithms. In this work, we present GPU-NB, a parallel version of one of the most widely used document classification algorithms, the Naïve Bayes, that uses graphics processing units (GPUs). In our evaluation using 6 different document collections, we show that the GPU-NB can maintain the same classification effectiveness (in most cases) while increasing the efficiency by up to 34x faster than its sequential version using CPU. GPU-NB is also up to 11x faster than a CPU-based parallel implementation of Naive Bayes running with 4 threads. Moreover, assuming an optimistic behavior of the CPU parallelization, GPU-NB should outperform the CPU-based implementation with up to 32 cores, at a small fraction of the cost. We also show that the efficiency of the GPU-NB parallelization is impacted by features of the document collections, particularly the number of classes, although the density of the collection (average number of occurrences of terms per document) has a significant impact as well.
Keywords :
Bayes methods; Internet; document handling; graphics processing units; multi-threading; parallel algorithms; parallel architectures; pattern classification; ADC algorithm parallelization; CPU parallelization; GPU-NB; Web 2.0; automatic document classification algorithm; data mining; document collection; fast CUDA-based implementation; graphics processing units; knowledge extraction; knowledge organization; machine learning; naive Bayes; optimistic behavior; probabilistic classification; text mining; Data mining; Graphics processing units; Instruction sets; Kernel; Parallel processing; Probability; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture and High Performance Computing (SBAC-PAD), 2013 25th International Symposium on
Conference_Location :
Porto de Galinhas
Print_ISBN :
978-1-4799-2927-6
Type :
conf
DOI :
10.1109/SBAC-PAD.2013.16
Filename :
6702594
Link To Document :
بازگشت