DocumentCode :
3175808
Title :
Cloud-based clustering of text documents using the GHSOM algorithm on the GridGain platform
Author :
Sarnovsky, Martin ; Ulbrik, Z.
Author_Institution :
Dept. of Cybern. & artificial Intell., Tech. Univ. in Kosice, Kosice, Slovakia
fYear :
2013
fDate :
23-25 May 2013
Firstpage :
309
Lastpage :
313
Abstract :
This paper provides an overview of our research activities aimed on efficient use of distributed computing concepts for text-mining tasks. Work presented within this paper describes the GHSOM (Growing Hierarchical Self-Organizing Maps) algorithm for clustering of text documents and proposes the design and implementation of distributed version of this approach. Proposed implementation is based on JBOWL framework as a base for text mining. For distribution we used MapReduce paradigm implemented within the GridGain framework, which was used as a cloud application platform. Experiments were performed on standard Reuters dataset and for testing purposes we decided to use a simple private cloud infrastructure.
Keywords :
cloud computing; data mining; parallel programming; pattern clustering; self-organising feature maps; text analysis; GHSOM algorithm; GridGain platform; JBOWL framework; Java bag-of-words library; MapReduce paradigm; Reuters dataset; cloud application platform; cloud-based clustering; growing hierarchical self-organizing maps algorithm; private cloud infrastructure; text documents clustering; text mining; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Informatics; Java; Neurons; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applied Computational Intelligence and Informatics (SACI), 2013 IEEE 8th International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
978-1-4673-6397-6
Type :
conf
DOI :
10.1109/SACI.2013.6608988
Filename :
6608988
Link To Document :
بازگشت