DocumentCode :
3124448
Title :
Improving Product Classification Using Images
Author :
Kannan, Anitha ; Talukdar, Partha Pratim ; Rasiwasia, Nikhil ; Ke, Qifa
Author_Institution :
Search Labs., Microsoft Res., Mountain View, CA, USA
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
310
Lastpage :
319
Abstract :
Product classification in Commerce search (e.g., Google Product Search, Bing Shopping) involves associating categories to offers of products from a large number of merchants. The categorized offers are used in many tasks including product taxonomy browsing and matching merchant offers to products in the catalog. Hence, learning a product classifier with high precision and recall is of fundamental importance in order to provide high quality shopping experience. A product offer typically consists of a short textual description and an image depicting the product. Traditional approaches to this classification task is to learn a classifier using only the textual descriptions of the products. In this paper, we show that the use of images, a weaker signal in our setting, in conjunction with the textual descriptions, a more discriminative signal, can considerably improve the precision of the classification task, irrespective of the type of classifier being used. We present a novel classification approach, Confusion Driven Probabilistic Fusion++ (CDPF++), that is cognizant of the disparity in the discriminative power of different types of signals and hence makes use of the confusion matrix of dominant signal (text in our setting) to prudently leverage the weaker signal (image), for an improved performance. Our evaluation performed on data from a major Commerce search engine´s catalog shows a 12% (absolute) improvement in precision at 100% coverage, and a 16% (absolute) improvement in recall at 90% precision compared to classifiers that only use textual description of products. In addition, CDPF++ also provides a more accurate classifier based only on the dominant signal (text) that can be used in situations in which only the dominant signal is available during application time.
Keywords :
Internet; cataloguing; cognition; electronic commerce; image classification; learning (artificial intelligence); probability; product quality; retail data processing; search engines; text analysis; CDPF++; commerce search engine catalog; confusion driven probabilistic fusion++; confusion matrix; discriminative signal; disparity cognizant; dominant signal; product classification task; product classifier; product taxonomy browsing; shopping experience; textual description; Business; Catalogs; Computers; Probabilistic logic; Taxonomy; Training; Vocabulary; e-commerce; image; product classification; text;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
ISSN :
1550-4786
Print_ISBN :
978-1-4577-2075-8
Type :
conf
DOI :
10.1109/ICDM.2011.79
Filename :
6137235
Link To Document :
بازگشت