DocumentCode
2054714
Title
Focused Crawling Using Name Disambiguation on Search Engine Results
Author
Martin, Nicolas ; Khelif, Khaled
Author_Institution
Cassidian, IPCC, EADS, Val-de-Reuil, France
fYear
2011
fDate
12-14 Sept. 2011
Firstpage
340
Lastpage
345
Abstract
In this paper, we report our approach allowing source selection in order to support Web data collection and tracking of events and biographical facts about a targeted person. The choice of the sources is crucial to enhance the quality of information extraction tools and it is considered as the first step in the collect and tracking task. We designed a source selection process to filter out ones that are not relevant for the targeted person - because they refer to an homonym. In this process, the name of the targeted person is submitted to the system and each result (title, snippet and url)is represented in the vector space model and then clustered, so that each cluster represents all the results about the same entity. The experimental results show that our approach can achieve interesting disambiguation performance only considering the search results.
Keywords
information filtering; search engines; Web data collection; biographical facts; event tracking; focused crawling; information extraction tools; name disambiguation; search engine results; source selection process; vector space model; Clustering algorithms; Companies; Context; Couplings; Feature extraction; Social network services; Web pages; Web People Search; WebLab; clustering; name disambiguation;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligence and Security Informatics Conference (EISIC), 2011 European
Conference_Location
Athens
Print_ISBN
978-1-4577-1464-1
Electronic_ISBN
978-0-7695-4406-9
Type
conf
DOI
10.1109/EISIC.2011.31
Filename
6061228
Link To Document