A Cross-Modal Approach for Extracting Semantic Relationships Between Concepts Using Tagged Images

Author

Katsurai, Makoto ; Ogawa, Tomomi ; Haseyama, Miki

Author_Institution

Grad. Sch. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo, Japan

Volume

16

Issue

4

fYear

2014

fDate

Jun-14

Firstpage

1059

Lastpage

1074

Abstract

This paper presents a cross-modal approach for extracting semantic relationships between concepts using tagged images. In the proposed method, we first project both text and visual features of the tagged images to a latent space using canonical correlation analysis (CCA). Then, under the probabilistic interpretation of CCA, we calculate a representative distribution of the latent variables for each concept. Based on the representative distributions of the concepts, we derive two types of measures: the semantic relatedness between the concepts and the abstraction level of each concept. Because these measures are derived from a cross-modal scheme that enables the collaborative use of both text and visual features, the semantic relationships can successfully reflect semantic and visual contexts. Experiments conducted on tagged images collected from Flickr show that our measures are more coherent to human cognition than the conventional measures that use either text or visual features, or the WordNet-based measures. In particular, a new measure of semantic relatedness, which satisfies the triangle inequality, obtains the best results among different distance measures in our framework. The applicability of our measures to multimedia-related tasks such as concept clustering, image annotation and tag recommendation is also shown in the experiments.

Keywords

Web sites; correlation methods; database management systems; feature extraction; multimedia communication; natural language processing; statistical analysis; CCA probabilistic interpretation; Flickr; WordNet-based measures; abstraction level; canonical correlation analysis; concept clustering; cross-modal scheme; distance measures; human cognition; image annotation; latent space; multimedia-related tasks; semantic contexts; semantic relatedness; semantic relationship extraction; tag recommendation; tagged images; text features; triangle inequality; visual contexts; visual features; Atmospheric measurements; Biomedical measurement; Feature extraction; Particle measurements; Probabilistic logic; Semantics; Visualization; Canonical correlation analysis; concept relationships; flickr; tagged images;

fLanguage

English

Journal_Title

Multimedia, IEEE Transactions on

Publisher

ieee

ISSN

1520-9210

Type

jour

DOI

10.1109/TMM.2014.2306655

Filename

6742613