• DocumentCode
    477577
  • Title

    Improved Automatic Keyphrase Extraction by Using Semantic Information

  • Author

    Wang, XiaoLing ; Mu, Dejun ; Fang, Jun

  • Author_Institution
    Sch. of Autom., Northwestern Polytech. Univ., Xi´´an
  • Volume
    1
  • fYear
    2008
  • fDate
    20-22 Oct. 2008
  • Firstpage
    1061
  • Lastpage
    1065
  • Abstract
    Keyphrases provide semantic metadata producing an overview of the content of a document, they are used in many text-mining applications. This paper proposes a new method that improves automatic keyphrase extraction by using semantic information of candidate keyphrases. Our method is realized in two stages. In selecting candidates stage, after extraction of all phrases from document, a word sense disambiguation method is used to get senses of phrases, then term conflation is performed by using case folding, stemming, and semantic relatedness between candidates. In filtering stage, four features are used to compute for each candidate: the TFxIDF measure describing the specificity of a phrase, first occurrence of a phrase in the document, length of a phrase, and coherence score which measure the semantic relatedness between the phrase and other candidates. A Naive Bayes scheme builds a prediction model training data with known keyphrases, and then uses the model to calculate the overall probability for each candidate. We evaluate semantically improved method against the well known Kea system by using a more effective semantically enhanced evaluation method. The inter-domain experiment shows that quality of keyphrases extraction can be improved significantly when semantic information is exploited. The intra-domain experiment shows our method is competitive with Kea++ algorithm, and not domain-specific.
  • Keywords
    Bayes methods; data mining; text analysis; automatic keyphrase extraction; naive Bayes scheme; semantic information; text-mining; word sense disambiguation method; Automation; Coherence; Data mining; Filtering; Length measurement; Machine learning algorithms; Mice; Predictive models; Probability; Training data; keyphrase extraction; semantic information; word sense disambiguation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computation Technology and Automation (ICICTA), 2008 International Conference on
  • Conference_Location
    Hunan
  • Print_ISBN
    978-0-7695-3357-5
  • Type

    conf

  • DOI
    10.1109/ICICTA.2008.180
  • Filename
    4659653