DocumentCode
2509489
Title
An Architecture for Finding Entities on the Web
Author
Demartini, Gianluca ; Firan, Claudiu S. ; Georgescu, Mihai ; Iofciu, Tereza ; Krestel, Ralf ; Nejdl, Wolfgang
Author_Institution
L3S Res. Center, Univ. of Hanover, Hanover, Germany
fYear
2009
fDate
9-11 Nov. 2009
Firstpage
230
Lastpage
237
Abstract
Recent progress in research fields such as information extraction and information retrieval enables the creation of systems providing better search experiences to Web users. For example, systems that retrieve entities instead of just documents have been built. In this paper we present an approach for large-scale entity retrieval using Web collections as underlying corpus. We propose an architecture for entity extraction and entity ranking starting from Web documents. This is obtained (1) using an existing Web document index and (2) creating an entity centric index. We describe advantages and feasibility of our approach using state-of-the-art tools.
Keywords
Internet; document handling; information retrieval; Web collections; Web document index; Web documents; World Wide Web; entity centric index; entity extraction; entity ranking; information extraction; information retrieval; large-scale entity retrieval; Data mining; Erbium; Image retrieval; Information retrieval; Natural language processing; Search engines; Service oriented architecture; Web pages; Web search; Wikipedia; entity retrieval; natural language processing; web search;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Congress, 2009. LA-WEB '09. Latin American
Conference_Location
Merida, Yucatan
Print_ISBN
978-0-7695-3856-3
Type
conf
DOI
10.1109/LA-WEB.2009.14
Filename
5341521
Link To Document