• DocumentCode
    2337486
  • Title

    Unsupervised key-phrases extraction from scientific papers using domain and linguistic knowledge

  • Author

    Krapivin, Mikalai ; Marchese, Maurizio ; Yadrantsau, Andrei ; Liang, Yanchun

  • Author_Institution
    Dept. of Inf. Eng. & Comput. Sci., Univ. of Trento, Trento
  • fYear
    2008
  • fDate
    13-16 Nov. 2008
  • Firstpage
    105
  • Lastpage
    112
  • Abstract
    The domain of Digital Libraries presents specific challenges for unsupervised information extraction to support both the automatic classification of documents and the enhancement of userspsila navigation in the digital content. In this paper, we propose a combined use of machine learning techniques (i.e. Support Vector Machines) and Natural Language Processing techniques (i.e. Stanford NLP parser) to tackle the problem of unsupervised key-phrases extraction from scientific papers. The proposed method strongly depends on the robust structural properties of a scientific paper as well as on the lexical knowledge that we are able to mine from its text. For the experimental assessment we have use a subset of ACM papers in the Computer Science domain containing 400 documents. Preliminary evaluation of the approach shows promising result that improves - on the same data-set - on state-of-the-art Bayesian learning system KEA from a minimum 27% to a maximum 77% depending on KEA parameters tuning and specific evaluation set. Our assessment is performed by comparison with key-phrases assigned by human experts in the specific domain and freely available through ACM portal.
  • Keywords
    data mining; digital libraries; information retrieval; linguistics; natural language processing; natural sciences computing; pattern classification; text analysis; unsupervised learning; digital library; document classification; domain knowledge; information extraction; linguistic knowledge; machine learning technique; natural language processing technique; scientific paper; text mining; unsupervised key-phrases extraction; Bayesian methods; Computer science; Data mining; Machine learning; Natural language processing; Navigation; Robustness; Software libraries; Support vector machine classification; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management, 2008. ICDIM 2008. Third International Conference on
  • Conference_Location
    London
  • Print_ISBN
    978-1-4244-2916-5
  • Electronic_ISBN
    978-1-4244-2917-2
  • Type

    conf

  • DOI
    10.1109/ICDIM.2008.4746749
  • Filename
    4746749