An Architecture for Finding Entities on the Web

Author

Demartini, Gianluca ; Firan, Claudiu S. ; Georgescu, Mihai ; Iofciu, Tereza ; Krestel, Ralf ; Nejdl, Wolfgang

Author_Institution

L3S Res. Center, Univ. of Hanover, Hanover, Germany

fYear

2009

fDate

9-11 Nov. 2009

Firstpage

230

Lastpage

237

Abstract

Recent progress in research fields such as information extraction and information retrieval enables the creation of systems providing better search experiences to Web users. For example, systems that retrieve entities instead of just documents have been built. In this paper we present an approach for large-scale entity retrieval using Web collections as underlying corpus. We propose an architecture for entity extraction and entity ranking starting from Web documents. This is obtained (1) using an existing Web document index and (2) creating an entity centric index. We describe advantages and feasibility of our approach using state-of-the-art tools.

Keywords

Internet; document handling; information retrieval; Web collections; Web document index; Web documents; World Wide Web; entity centric index; entity extraction; entity ranking; information extraction; information retrieval; large-scale entity retrieval; Data mining; Erbium; Image retrieval; Information retrieval; Natural language processing; Search engines; Service oriented architecture; Web pages; Web search; Wikipedia; entity retrieval; natural language processing; web search;

fLanguage

English

Publisher

ieee

Conference_Titel

Web Congress, 2009. LA-WEB '09. Latin American

Conference_Location

Merida, Yucatan

Print_ISBN

978-0-7695-3856-3

Type

conf

DOI

10.1109/LA-WEB.2009.14

Filename

5341521