Title :
Extracting Named Entities and Synonyms from Wikipedia
Author :
Bøhn, Christian ; Nørvåg, Kjetil
Author_Institution :
Dept. of Comput. Sci., Norwegian Univ. of Sci. & Technol., Trondheim, Norway
Abstract :
In many search domains, both contents and searches are frequently tied to named entities such as a person, a company or similar. An example of such a domain is a news archive. One challenge from an information retrieval point of view is that a single entity can have more than one way of referring to it. In this paper we describe how to use Wikipedia contents to automatically generate a dictionary of named entities and synonyms that are all referring to the same entity. This dictionary can subsequently be used to improve search quality, for example using query expansion. Through an experimental evaluation we show that with our approach, we can find named entities and their synonyms with a high degree of accuracy.
Keywords :
knowledge acquisition; query processing; Wikipedia contents; information retrieval; named entity extraction; query expansion; search quality; Application software; Computer science; Data mining; Dictionaries; Electronic mail; Filters; Information retrieval; Search engines; Text recognition; Wikipedia;
Conference_Titel :
Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on
Conference_Location :
Perth, WA
Print_ISBN :
978-1-4244-6695-5
DOI :
10.1109/AINA.2010.50