Title :
Querying and clustering Web pages about persons and organizations
Author :
Ye, Shiren ; Chua, Tat-Seng ; Kei, Jeremy R.
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore
Abstract :
One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. We describe a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.
Keywords :
Internet; pattern clustering; query formulation; search engines; Internet; Web page clustering; Web surfing; decision model; query formulation; search engine; statistical analysis; Biographies; Books; Clustering algorithms; Home computing; Internet; Partitioning algorithms; Resumes; Search engines; Tellurium; Web pages;
Conference_Titel :
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN :
0-7695-1932-6
DOI :
10.1109/WI.2003.1241214