• DocumentCode
    2509489
  • Title

    An Architecture for Finding Entities on the Web

  • Author

    Demartini, Gianluca ; Firan, Claudiu S. ; Georgescu, Mihai ; Iofciu, Tereza ; Krestel, Ralf ; Nejdl, Wolfgang

  • Author_Institution
    L3S Res. Center, Univ. of Hanover, Hanover, Germany
  • fYear
    2009
  • fDate
    9-11 Nov. 2009
  • Firstpage
    230
  • Lastpage
    237
  • Abstract
    Recent progress in research fields such as information extraction and information retrieval enables the creation of systems providing better search experiences to Web users. For example, systems that retrieve entities instead of just documents have been built. In this paper we present an approach for large-scale entity retrieval using Web collections as underlying corpus. We propose an architecture for entity extraction and entity ranking starting from Web documents. This is obtained (1) using an existing Web document index and (2) creating an entity centric index. We describe advantages and feasibility of our approach using state-of-the-art tools.
  • Keywords
    Internet; document handling; information retrieval; Web collections; Web document index; Web documents; World Wide Web; entity centric index; entity extraction; entity ranking; information extraction; information retrieval; large-scale entity retrieval; Data mining; Erbium; Image retrieval; Information retrieval; Natural language processing; Search engines; Service oriented architecture; Web pages; Web search; Wikipedia; entity retrieval; natural language processing; web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Congress, 2009. LA-WEB '09. Latin American
  • Conference_Location
    Merida, Yucatan
  • Print_ISBN
    978-0-7695-3856-3
  • Type

    conf

  • DOI
    10.1109/LA-WEB.2009.14
  • Filename
    5341521