DocumentCode :
2180959
Title :
Towards better entity resolution techniques for Web document collections
Author :
Yerva, Surender Reddy ; Miklós, Zoltán ; Aberer, Karl
Author_Institution :
EPFL LSIR, Lausanne, Switzerland
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
209
Lastpage :
214
Abstract :
As person names are non-unique, the same name on different Web pages might or might not refer to the same real-world person. This entity identification problem is one of the most challenging issues in realizing the Semantic Web or entity-oriented search. We address this disambiguation problem, which is very similar to the entity resolution problem studied in relational databases, however there are also several differences. Most importantly Web pages often only contain partial or incomplete information about the persons, moreover the available information is very heterogeneous, thus we are only able to obtain some uncertain evidence about whether two names refer to the same person using similarity functions. These similarity functions capture some aspects of the similarities between Web-pages, where the names occur, thus they perform very differently for the different names. We analyze some data engineering techniques to cope with the limited accuracy of the similarity functions and to combine multiple functions. Even with our simple techniques we could demonstrate systematic performance improvements and produce comparable results to state-of-the-art methods.
Keywords :
Internet; document handling; Web document collections; Web pages; data engineering; disambiguation problem; entity identification problem; entity resolution problem; entity-oriented search; relational databases; semantic Web; similarity functions; Data analysis; Data engineering; Fuzzy sets; Information resources; Machine learning; Performance analysis; Relational databases; Semantic Web; State estimation; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-6522-4
Electronic_ISBN :
978-1-4244-6521-7
Type :
conf
DOI :
10.1109/ICDEW.2010.5452698
Filename :
5452698
Link To Document :
بازگشت