Extracting Named Entities and Synonyms from Wikipedia

Author

Bøhn, Christian ; Nørvåg, Kjetil

Author_Institution

Dept. of Comput. Sci., Norwegian Univ. of Sci. & Technol., Trondheim, Norway

fYear

2010

fDate

20-23 April 2010

Firstpage

1300

Lastpage

1307

Abstract

In many search domains, both contents and searches are frequently tied to named entities such as a person, a company or similar. An example of such a domain is a news archive. One challenge from an information retrieval point of view is that a single entity can have more than one way of referring to it. In this paper we describe how to use Wikipedia contents to automatically generate a dictionary of named entities and synonyms that are all referring to the same entity. This dictionary can subsequently be used to improve search quality, for example using query expansion. Through an experimental evaluation we show that with our approach, we can find named entities and their synonyms with a high degree of accuracy.

Keywords

knowledge acquisition; query processing; Wikipedia contents; information retrieval; named entity extraction; query expansion; search quality; Application software; Computer science; Data mining; Dictionaries; Electronic mail; Filters; Information retrieval; Search engines; Text recognition; Wikipedia;

fLanguage

English

Publisher

ieee

Conference_Titel

Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on

Conference_Location

Perth, WA

ISSN

1550-445X

Print_ISBN

978-1-4244-6695-5

Type

conf

DOI

10.1109/AINA.2010.50

Filename

5474864