Image annotation via deep neural network

Author

Sun Chengjian ; Songhao Zhu ; Zhe Shi

Author_Institution

Sch. of Autom., Nanjing Univ. of Posts & Telecommun., Nanjing, China

fYear

2015

fDate

18-22 May 2015

Firstpage

518

Lastpage

521

Abstract

Multilabel image annotation is one of the most important open problems in computer vision field. Unlike existing works that usually use conventional visual features to annotate images, features based on deep learning have shown potential to achieve outstanding performance. In this work, we propose a multimodal deep learning framework, which aims to optimally integrate multiple deep neural networks pretrained with convolutional neural networks. In particular, the proposed framework explores a unified two-stage learning scheme that consists of (i) learning to fune-tune the parameters of deep neural network with respect to each individual modality, and (ii) learning to find the optimal combination of diverse modalities simultaneously in a coherent process. Experiments conducted on the NUS-WIDE dataset evaluate the performance of the proposed framework for multilabel image annotation, in which the encouraging results validate the effectiveness of the proposed algorithms.

Keywords

computer vision; feature extraction; neural nets; NUS-WIDE dataset; coherent process; computer vision; conventional visual features; convolutional neural networks; diverse modality; multilabel image annotation; multimodal deep learning framework; multiple deep neural networks; optimal combination; unified two-stage learning scheme; Computer architecture; Computer vision; Feature extraction; Image classification; Neural networks; Pattern recognition; Visualization;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Vision Applications (MVA), 2015 14th IAPR International Conference on

Conference_Location

Tokyo

Type

conf

DOI

10.1109/MVA.2015.7153244

Filename

7153244