DocumentCode
1561765
Title
STED: a system for topic enumeration and distillation
Author
Greco, Gianluigi ; Greco, Sergio ; Zumpano, Ester
Author_Institution
DEIS, Univ. della Calabria, Rende, Italy
fYear
2002
Firstpage
294
Lastpage
299
Abstract
Search services on hyperlinked data are becoming popular among users because of the huge amount of data available and the consequent difficulty of retrieving and filtering relevant documents. Traditional term-based search engines are not very useful for this purpose since the resulting ranking depends on the users´s precision in expressing the query. Current research, instead, takes a different approach, called topic distillation, which consists of finding documents related to the query topic, but these do not necessarily contain the query string. Current algorithms for topic distillation first compute a base set containing all the relevant pages and then apply an iterative procedure to obtain the authoritative pages. In this paper we present STED, a system for topic distillation and enumeration (i.e. identification of different communities) of Web documents. The system is based on a technique which computes authoritative pages by analyzing the structure of the base set. More specifically, the system applies a statistical approach to the co-citation matrix associated with the base set, to find the most co-cited pages and analyzes both the link structure and the content of pages. Several experiments have demonstrated the effectiveness and efficiency of the system.
Keywords
citation analysis; information resources; information retrieval; STED; Web documents; co-citation matrix; co-cited pages; document filtering; document retrieval; hyperlinked data; iterative procedure; query string; ranking; search services; statistical approach; topic distillation; topic enumeration; Databases; Filtering; Information processing; Information resources; Information retrieval; Iterative algorithms; Search engines; Statistical analysis; Web sites; World Wide Web;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology: Coding and Computing, 2002. Proceedings. International Conference on
Print_ISBN
0-7695-1506-1
Type
conf
DOI
10.1109/ITCC.2002.1000405
Filename
1000405
Link To Document