DocumentCode
2190575
Title
Inter-document reference detection as an alternative to full text semantic analysis in document clustering
Author
De Maziere, Patrick A. ; Van Hulle, Marc M.
Author_Institution
Dept. Healthcare & Technol., KHLeuven, Leuven, Belgium
fYear
2013
fDate
22-25 Sept. 2013
Firstpage
1
Lastpage
6
Abstract
We discuss here the search for inter-document references as an alternative to the grouping of document inventories based on a full text semantic analysis. The used document inventory, which is not publicly available, was provided to us by the European Union (EU) in the framework of an EU project, the aim of which was to analyse, classify, and visualise EU funded research in social sciences and humanities in EU framework programmes FP5 and FP6. This project, called the SSH project for short, was aimed at the evaluation of the contributions of research to the development of EU policies. For the semantic based grouping, we start from a Multi-Dimensional Scaling analysis of the document vectors, which is the result of a prior semantic analysis. As an alternative to a semantic analysis, we searched for inter-document references or direct references. Direct references are defined as terms that explicitly refer to other documents present in the inventory. We show that the grouping based on references is largely similar to the one based on semantics, but with considerably less computational efforts. In addition, the non-expert can make better use of the results, since the references are displayed as graphical webpages with hyperlinks pointing to both the referenced and the referencing document(s), and the reason of linkage. Finally, we show that the combination of a database, to store the data and the (intermediate) results, and a webserver, to visualise the results, offers a powerful platform to analyse the document inventory and to share the results with all participants/collaborators involved in a data- and computation intensive EU-project, thereby guaranteeing both data- and result-consistency.
Keywords
data visualisation; database management systems; document handling; file servers; pattern clustering; EU funded research; EU project; European Union; FP5; FP6; Webserver; data-consistency; database; document clustering; document inventories; document vectors; full text semantic analysis; graphical webpages; humanities; hyperlinks; interdocument reference detection; multidimensional scaling analysis; result-consistency; semantic based grouping; social sciences; Databases; Europe; Semantics; Servers; Terminology; Text analysis; Vectors; HPC; Semantic Analysis; Text Mining; client-server infrastructure;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on
Conference_Location
Southampton
ISSN
1551-2541
Type
conf
DOI
10.1109/MLSP.2013.6661952
Filename
6661952
Link To Document