DocumentCode :
2226409
Title :
Querying and clustering Web pages about persons and organizations
Author :
Ye, Shiren ; Chua, Tat-Seng ; Kei, Jeremy R.
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore
fYear :
2003
fDate :
13-17 Oct. 2003
Firstpage :
344
Lastpage :
350
Abstract :
One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. We describe a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.
Keywords :
Internet; pattern clustering; query formulation; search engines; Internet; Web page clustering; Web surfing; decision model; query formulation; search engine; statistical analysis; Biographies; Books; Clustering algorithms; Home computing; Internet; Partitioning algorithms; Resumes; Search engines; Tellurium; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN :
0-7695-1932-6
Type :
conf
DOI :
10.1109/WI.2003.1241214
Filename :
1241214
Link To Document :
بازگشت