• DocumentCode
    2054714
  • Title

    Focused Crawling Using Name Disambiguation on Search Engine Results

  • Author

    Martin, Nicolas ; Khelif, Khaled

  • Author_Institution
    Cassidian, IPCC, EADS, Val-de-Reuil, France
  • fYear
    2011
  • fDate
    12-14 Sept. 2011
  • Firstpage
    340
  • Lastpage
    345
  • Abstract
    In this paper, we report our approach allowing source selection in order to support Web data collection and tracking of events and biographical facts about a targeted person. The choice of the sources is crucial to enhance the quality of information extraction tools and it is considered as the first step in the collect and tracking task. We designed a source selection process to filter out ones that are not relevant for the targeted person - because they refer to an homonym. In this process, the name of the targeted person is submitted to the system and each result (title, snippet and url)is represented in the vector space model and then clustered, so that each cluster represents all the results about the same entity. The experimental results show that our approach can achieve interesting disambiguation performance only considering the search results.
  • Keywords
    information filtering; search engines; Web data collection; biographical facts; event tracking; focused crawling; information extraction tools; name disambiguation; search engine results; source selection process; vector space model; Clustering algorithms; Companies; Context; Couplings; Feature extraction; Social network services; Web pages; Web People Search; WebLab; clustering; name disambiguation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligence and Security Informatics Conference (EISIC), 2011 European
  • Conference_Location
    Athens
  • Print_ISBN
    978-1-4577-1464-1
  • Electronic_ISBN
    978-0-7695-4406-9
  • Type

    conf

  • DOI
    10.1109/EISIC.2011.31
  • Filename
    6061228