DocumentCode
3230648
Title
Web mining
Author
Baeza-Yates, Ricardo
Author_Institution
Comput. Sci. Dept., Chile Univ., Chile
fYear
2005
fDate
31 Oct.-2 Nov. 2005
Abstract
The Web grows and evolves faster than we would like and expect, imposing scalability and relevance problems to Web search engines. There are three main data types in the Web: content (text, multimedia), structure (links that form a graph) and Web usage (transactions from Web logs). We emphasize the last type of data, in particular a new subfield called query mining. Server logs of search engines store traces of queries submitted by users, which include queries themselves along with Web pages selected in their answers. Query mining is based in the fact that user queries in search engines and Web sites give valuable information on the interests of people. In addition, clicks after queries relate those interests to actual content. The framework is based on a new vectorial representation of query traces which allows to treat them similarly to documents in traditional information retrieval systems. Also, we consider the problem of reducing the bias in the selections caused by the particular answer rankings computed by the search engine. We show the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we show with experiments the effectiveness of our approach. The same ideas can be applied to advertising campaigns in search engines and the automatic generation of a pseudo-ontology for queries.
Keywords
Internet; data mining; information retrieval; search engines; Web mining; Web pages; Web search engines; Web sites; information retrieval system; pseudo-ontology; query mining; Application software; Boosting; Computer science; Data mining; Information retrieval; Scalability; Search engines; Web mining; Web pages; Web search;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Congress, 2005. LA-WEB 2005. Third Latin American
Print_ISBN
0-7695-2471-0
Type
conf
DOI
10.1109/LAWEB.2005.49
Filename
1592350
Link To Document