Title :
Automatic New Topic Identification in Search Engine Transaction Logs Using Multiple Linear Regression
Author :
Ozmutlu, Seda ; Ozmutlu, H. Cenk ; Spink, Amanda
Author_Institution :
Uludag Univ., Bursa
Abstract :
Content analysis of search engine user queries is an important task for search engine research, and identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. The purpose of this study is to provide automatic new topic identification of search engine query logs, and estimate the effect of statistical characteristics of search engine queries on new topic identification. By applying multiple linear regression and ANOVA on a sample data log from the FAST search engine, we have reached the following findings: 1) We demonstrated that the statistical characteristics of Web search queries are effective on shifting to a new topic; 2) Multiple linear regression is a successful tool for estimating topic shifts and continuations. This study provides statistical proof for the relationship between the non-semantic characteristics of Web search queries and the occurrence of topic shifts and continuations.
Keywords :
Internet; query processing; regression analysis; search engines; FAST search engine; Web search queries; automatic new topic identification; content analysis; multiple linear regression; query clustering; search engine transaction logs; statistical characteristics; Analysis of variance; Artificial neural networks; Industrial engineering; Information analysis; Information systems; Linear regression; Multitasking; Performance analysis; Search engines; Web search;
Conference_Titel :
Hawaii International Conference on System Sciences, Proceedings of the 41st Annual
Conference_Location :
Waikoloa, HI
DOI :
10.1109/HICSS.2008.70