DocumentCode
2731779
Title
Disambiguation Algorithm for People Search on the Web
Author
Kalashnikov, Dmitri V. ; Mehrotra, Sharad ; Chen, Zhaoqi ; Nuray-Turan, Rabia ; Ashish, Naveen
Author_Institution
Dept. of Comput. Sci., California Univ., Irvine, CA
fYear
2007
fDate
15-20 April 2007
Firstpage
1258
Lastpage
1260
Abstract
In this paper we develop a disambiguation algorithm and then study its impact on People Search. The proposed algorithm first uses extraction techniques to automatically extract `significant´ entities such as the names of other persons, organizations, and locations on each Web page. In addition, it extracts and parses HTML and Web related data on each Web page, such as hyperlinks and email addresses. The algorithm then views all this information in a unified way: as an entity-relationship graph where entities (e.g., people, organizations, locations, Web pages) are interconnected via relationships (e.g., `Web page-mentions-person´, relationships derived from hyperlinks, etc). The algorithm gains its power by being able to analyze several types of information: attributes associated with the entities (e.g., TF/IDF for Web pages) and, most importantly, direct and indirect interconnections that exist among entities in the ER graph. We next outline our approach in Section 2 and then compare it with the state of the art solutions in Section 3.
Keywords
Web sites; information retrieval; HTML; People Search; Web page; World Wide Web; disambiguation algorithm; entity-relationship graph; extraction techniques; Clustering algorithms; Computer science; Data mining; Information analysis; Internet; Machine learning; Middleware; Search engines; Web pages; Web search;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
Conference_Location
Istanbul
Print_ISBN
1-4244-0802-4
Electronic_ISBN
1-4244-0803-2
Type
conf
DOI
10.1109/ICDE.2007.368987
Filename
4221777
Link To Document