DocumentCode :
2054714
Title :
Focused Crawling Using Name Disambiguation on Search Engine Results
Author :
Martin, Nicolas ; Khelif, Khaled
Author_Institution :
Cassidian, IPCC, EADS, Val-de-Reuil, France
fYear :
2011
fDate :
12-14 Sept. 2011
Firstpage :
340
Lastpage :
345
Abstract :
In this paper, we report our approach allowing source selection in order to support Web data collection and tracking of events and biographical facts about a targeted person. The choice of the sources is crucial to enhance the quality of information extraction tools and it is considered as the first step in the collect and tracking task. We designed a source selection process to filter out ones that are not relevant for the targeted person - because they refer to an homonym. In this process, the name of the targeted person is submitted to the system and each result (title, snippet and url)is represented in the vector space model and then clustered, so that each cluster represents all the results about the same entity. The experimental results show that our approach can achieve interesting disambiguation performance only considering the search results.
Keywords :
information filtering; search engines; Web data collection; biographical facts; event tracking; focused crawling; information extraction tools; name disambiguation; search engine results; source selection process; vector space model; Clustering algorithms; Companies; Context; Couplings; Feature extraction; Social network services; Web pages; Web People Search; WebLab; clustering; name disambiguation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics Conference (EISIC), 2011 European
Conference_Location :
Athens
Print_ISBN :
978-1-4577-1464-1
Electronic_ISBN :
978-0-7695-4406-9
Type :
conf
DOI :
10.1109/EISIC.2011.31
Filename :
6061228
Link To Document :
بازگشت