Duplicate-Search-Based Image Annotation Using Web-Scale Data

Author

Wang, Xin-Jing ; Zhang, Lei ; Ma, Wei-Ying

Author_Institution

Microsoft Research Asia, Haidian District, Beijing, China

Volume

100

Issue

9

fYear

2012

Firstpage

2705

Lastpage

2721

Abstract

Easy photo-taking and photo-sharing today make image an increasingly important type of media in people´s everyday life, which arouses a growing demand for a practical image understanding technique. Traditional computer vision or machine learning methods which learn models based on a set of training data are still in the stage of tackling hundreds of object categories. Such a scale is far from practical usage. In recent years, the technique of search-based image annotation on a large-scale data set has demonstrated great success. Rather than directly mapping visual features to texts which is inevitably hindered by the semantic gap, it understands the content of an image by propagating labels of its similar images in a large-scale data set. Since similarity search is performed among homogenous data, the difficulty is greatly reduced. This paper summarizes the extensive work on web image annotation using the large-scale metadata and social information available on the Web, and introduces the Arista system, which is a nonparametric image annotation platform built upon two billion web images. We propose a highly efficient and scalable duplicate-search technique so that the Arista system can be deployed on a few servers. A few interesting applications such as building large-scale celebrity face database and text-to-image translation are also presented in this paper.

Keywords

Databases; Feature extraction; Image classification; Information retrieval; Measurement; Semantics; Text mining; Automatic image annotation; duplicate-search-based image annotation;

fLanguage

English

Journal_Title

Proceedings of the IEEE

Publisher

ieee

ISSN

0018-9219

Type

jour

DOI

10.1109/JPROC.2012.2193109

Filename

6210348