• DocumentCode
    2492487
  • Title

    Text-Aided Image Classification: Using Labeled Text from Web to Help Image Classification

  • Author

    Lin, Yuan ; Chen, Yuqiang ; Xue, Guirong ; Yu, Yong

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2010
  • fDate
    6-8 April 2010
  • Firstpage
    267
  • Lastpage
    273
  • Abstract
    As more and more multimedia data become available on the Web, mining on those data is playing an increasingly important role in Web applications. In this paper, we investigate the interplay between multimedia data mining and text data mining. Specifically, in an approach we called text-aided image classification (TAIC), we address the problem of image classification with very limited amount of labeled images and a large amount of auxiliary labeled text data. This problem is important in practice, since currently on the Web, labeled text data are usually much more than image data. To solve the problem, based on the “bag-of-words” view and the Naive Bayes classification model, we focus our attention on the estimation of the image feature distribution under given concept. We extend the Naive Bayes algorithm by considering a mapping that maps the most discriminative text features into the image feature space. This feature mapping is estimated based on the text-image cooccurrence data on the Web, acting like a bridge that connects text and image knowledge. With this process, we estimate target image feature distribution from a text model based on sufficient labeled data. Our empirical results on real world data sets show that our method makes a good approximation of the image feature distribution when trained with abundant labeled images. In the case amount of labeled images is very limited, the classification performance is improved by using auxiliary labeled text data, which shows that our method can indeed integrate text and image knowledge in a simple yet effective way.
  • Keywords
    Bayes methods; Internet; data mining; image classification; multimedia systems; pattern classification; text analysis; Naive Bayes classification model; auxiliary labeled text data; data mining; image classification; image feature distribution; multimedia data; text-aided image classification; Australia; Computer displays; Engines; Image classification; Petroleum;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Conference (APWEB), 2010 12th International Asia-Pacific
  • Conference_Location
    Busan
  • Print_ISBN
    978-1-7695-4012-2
  • Electronic_ISBN
    978-1-4244-6600-9
  • Type

    conf

  • DOI
    10.1109/APWeb.2010.49
  • Filename
    5474126