مرکز منطقه ای اطلاع رساني علوم و فناوري - Towards effective processing of large text collections

DocumentCode :

3286406

Title :

Towards effective processing of large text collections

Author :

Szymanski, Janusz ; Krawczyk, Harald

Author_Institution :

Dept. of Electron., Telecommun. & Inf., Gdansk Univ. of Technol., Gdańsk, Poland

fYear :

2012

fDate :

18-20 Sept. 2012

Firstpage :

265

Lastpage :

270

Abstract :

In the article we describe the approach to parallel implementation of elementary operations for textual data categorization. In the experiments we evaluate parallel computations of similarity matrices and k-means algorithm. The test datasets have been prepared as graphs created from Wikipedia articles related with links. When we create the clustering data packages, we compute pairs of eigenvectors and eigenvalues for visualizations of the datasets. We describe the method used for evaluation of the clustering quality. Finally we discuss achieved results, point some improvements and perspectives for future development.

Keywords :

Web sites; data visualisation; eigenvalues and eigenfunctions; matrix algebra; pattern clustering; text analysis; Wikipedia articles; clustering data packages; clustering quality; dataset visualizations; eigenvalues; eigenvectors; elementary operations; graphs; k-means algorithm; large text collections; parallel computations; similarity matrices; textual data categorization; PCA; documents categorization; text clustering;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Innovative Computing Technology (INTECH), 2012 Second International Conference on

Conference_Location :

Casablanca

Print_ISBN :

978-1-4673-2678-0

Type :

conf

DOI :

10.1109/INTECH.2012.6457784

Filename :

6457784

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3286406