DocumentCode
3154096
Title
How do they compare? Automatic identification of comparable entities on the Web
Author
Jain, Alpa ; Pantel, Patrick
Author_Institution
Yahoo! Labs., Sunnyvale, CA, USA
fYear
2011
fDate
3-5 Aug. 2011
Firstpage
228
Lastpage
233
Abstract
People love comparing things: from home mortgages and digital cameras to travel destinations and political philosophies. Today, we are mostly limited to browsing documents after issuing comparative queries to Web search engines, such as “15-year vs. 30-year mortgage”, “Nikon D90 / Canon 40D”, “Oahu or Maui”, and “communism vs. fascism”. There is an opportunity to improve the search experience by automatically offering comparisons to users. In this paper, we propose a first step towards this goal of comparative analysis by mining a broad class of comparable entities from search query logs and a large Web crawl. Example comparables that we extract include medicines, appliances, electronics, vacation destinations, and many more. We present an extensive empirical analysis showing that our methods generate comparables with high precision and recall, and showing that Web search query logs are a superior source for mining such entities as compared to Web pages, typically used for extraction tasks. We further compare the performance of our methods with “related entities” reported by Google Sets, and show a gain of 39% in average precision and a gain of 30% in NCDG.
Keywords
data mining; online front-ends; query processing; search engines; Google set; NCDG; Web entities mining; Web page; Web search engine; Web search query logs; automatic identification; digital camera; document browsing; home mortgage; large Web crawl; political philosophy; travel destination; vacation destination; Calculators; Data mining; Learning systems; Loans and mortgages; Noise measurement; Semantics; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Reuse and Integration (IRI), 2011 IEEE International Conference on
Conference_Location
Las Vegas, NV
Print_ISBN
978-1-4577-0964-7
Electronic_ISBN
978-1-4577-0965-4
Type
conf
DOI
10.1109/IRI.2011.6009551
Filename
6009551
Link To Document