Abstract :
In this paper, we address the problem of recognizing images with weakly annotated text tags. Most previous work either cannot be applied to the scenarios where the tags are loosely related to the images, or simply take a pre-fusion at the feature level or a post-fusion at the decision level to combine the visual and textual content. Instead, we first encode the text tags as the relations among the images, and then propose a semi-supervised relational topic model (ss-RTM) to explicitly model the image content and their relations. In such way, we can efficiently leverage the loosely related tags, and build an intermediate level representation for a collection of weakly annotated images. The intermediate level representation can be regarded as a mid-level fusion of the visual and textual content, which is able to explicitly model their intrinsic relationships. Moreover, image category labels are also modeled in the ss-RTM, and recognition can be conducted without training an additional discriminative classifier. Our extensive experiments on social multimedia datasets (images+tags) demonstrated the advantages of the proposed model.
Keywords :
data analysis; data visualisation; encoding; image fusion; image recognition; social networking (online); text analysis; decision level; discriminative classifier; encoding; feature level; image content; image recognition; intermediate level representation; midlevel fusion; postfusion; prefusion; semisupervised relational topic model; social media datasets; ss-RTM; textual content; visual content; weakly annotated text tags; Computational modeling; Image recognition; Joints; Logistics; Testing; Training; Visualization; Image recognition; Social Media; Tag; Topic Model;