Title of article :
A clustering study of a 7000 EU document inventory using MDS and SOM
Author/Authors :
De Mazière، نويسنده , , Patrick A. and Van Hulle، نويسنده , , Marc M.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2011
Pages :
15
From page :
8835
To page :
8849
Abstract :
In this article, we discuss a number of methods and tools to cluster a 7000 document inventory in order to evaluate the impact of EU funded research in social sciences and humanities on EU policies. The inventory, which is not publicly available, but provided to us by the European Union (EU) in the framework of an EU project, could be divided into three main categories: research documents, influential policy documents, and policy documents. To represent the results in a way that non-experts could make use of it, we explored and compared two visualisation techniques, multi-dimensional scaling (MDS) and the self-organising map (SOM), and one of the latter’s derivatives, the U-matrix. Contrary to most other approaches, which perform text analyses only on document titles and abstracts, we performed a full text analysis on more than 300,000 pages in total. Due to the inability of many software suites to handle text mining problems of this size, we developed our own analysis platform. We show that the combination of a U-matrix and an MDS map, which is rarely performed in the domain of text mining, reveals information that would go unnoticed otherwise. Furthermore, we show that the combination of a database, to store the data and the (intermediate) results, and a webserver, to visualise the results, offers a powerful platform to analyse the data and share the results with all participants/collaborators involved in a data- and computation intensive EU-project, thereby guaranteeing both data- and result consistency.
Keywords :
Text Mining , DATA WAREHOUSING , High dimensional data , Visualisation methods , High performance computing (HPC) , Parallel algorithms
Journal title :
Expert Systems with Applications
Serial Year :
2011
Journal title :
Expert Systems with Applications
Record number :
2349610
Link To Document :
بازگشت