• DocumentCode
    2190575
  • Title

    Inter-document reference detection as an alternative to full text semantic analysis in document clustering

  • Author

    De Maziere, Patrick A. ; Van Hulle, Marc M.

  • Author_Institution
    Dept. Healthcare & Technol., KHLeuven, Leuven, Belgium
  • fYear
    2013
  • fDate
    22-25 Sept. 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    We discuss here the search for inter-document references as an alternative to the grouping of document inventories based on a full text semantic analysis. The used document inventory, which is not publicly available, was provided to us by the European Union (EU) in the framework of an EU project, the aim of which was to analyse, classify, and visualise EU funded research in social sciences and humanities in EU framework programmes FP5 and FP6. This project, called the SSH project for short, was aimed at the evaluation of the contributions of research to the development of EU policies. For the semantic based grouping, we start from a Multi-Dimensional Scaling analysis of the document vectors, which is the result of a prior semantic analysis. As an alternative to a semantic analysis, we searched for inter-document references or direct references. Direct references are defined as terms that explicitly refer to other documents present in the inventory. We show that the grouping based on references is largely similar to the one based on semantics, but with considerably less computational efforts. In addition, the non-expert can make better use of the results, since the references are displayed as graphical webpages with hyperlinks pointing to both the referenced and the referencing document(s), and the reason of linkage. Finally, we show that the combination of a database, to store the data and the (intermediate) results, and a webserver, to visualise the results, offers a powerful platform to analyse the document inventory and to share the results with all participants/collaborators involved in a data- and computation intensive EU-project, thereby guaranteeing both data- and result-consistency.
  • Keywords
    data visualisation; database management systems; document handling; file servers; pattern clustering; EU funded research; EU project; European Union; FP5; FP6; Webserver; data-consistency; database; document clustering; document inventories; document vectors; full text semantic analysis; graphical webpages; humanities; hyperlinks; interdocument reference detection; multidimensional scaling analysis; result-consistency; semantic based grouping; social sciences; Databases; Europe; Semantics; Servers; Terminology; Text analysis; Vectors; HPC; Semantic Analysis; Text Mining; client-server infrastructure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on
  • Conference_Location
    Southampton
  • ISSN
    1551-2541
  • Type

    conf

  • DOI
    10.1109/MLSP.2013.6661952
  • Filename
    6661952