مرکز منطقه ای اطلاع رساني علوم و فناوري - Keyword Extraction Using Word Co-occurrence

DocumentCode :

2423125

Title :

Keyword Extraction Using Word Co-occurrence

Author :

Wartena, Christian ; Brussee, Rogier ; Slakhorst, Wout

Author_Institution :

Novay, Enschede, Netherlands

fYear :

2010

fDate :

Aug. 30 2010-Sept. 3 2010

Firstpage :

Lastpage :

Abstract :

A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a widely used relevance measure. While easy to compute and giving quite satisfactory results, this measure does not take (semantic) relations between words into account. In this paper we study some alternative relevance measures that do use relations between words. They are computed by defining co-occurrence distributions for words and comparing these distributions with the document and the corpus distribution. We then evaluate keyword extraction algorithms defined by selecting different relevance measures. For two corpora of abstracts with manually assigned keywords, we compare manually extracted keywords with different automatically extracted ones. The results show that using word co-occurrence information can improve precision and recall over tf.idf.

Keywords :

information retrieval; text analysis; word processing; document text; keyword extraction algorithm; relevance measurement; word co-occurrence; Abstracts; Context; Correlation; Feature extraction; Markov processes; Probability distribution; Semantics; co-occurrence; distributional hypothesis; extraction; term ranking;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Database and Expert Systems Applications (DEXA), 2010 Workshop on

Conference_Location :

Bilbao

ISSN :

1529-4188

Print_ISBN :

978-1-4244-8049-4

Type :

conf

DOI :

10.1109/DEXA.2010.32

Filename :

5592000

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2423125