• DocumentCode
    632696
  • Title

    Decoupling Sparse Coding with Fusion of Fisher Vectors and Scalable SVMs for Large-Scale Visual Recognition

  • Author

    Zhengping Ji

  • Author_Institution
    Adv. Image Res. Lab., Samsung Semicond. Inc., Pasadena, CA, USA
  • fYear
    2013
  • fDate
    23-28 June 2013
  • Firstpage
    450
  • Lastpage
    457
  • Abstract
    With the advent of huge collection of images from Internet and emerging mobile devices, large-scale image classification draws amount of research attention in computer vision and AI communities. The advancement of large-scale image classification largely depends on solutions to two problems: how to learn good feature representation from variant scales of pixels, and how to create classification models that can discriminate the feature representation for different semantic meanings of many objects. In this paper, we tackle the first problem by combining different feature representations via sparse coding and Fisher vectors of SIFT and color-based features. To deal with the second problem, we utilize the Averaged Stochastic Gradient Descent (ASGD) algorithm to enable fast and incremental learning of SVMs and further generate confidence values to interpret the likelihood of multiple object categories appearing in the image. We evaluate the proposed learning framework on the ImageNet, a benchmark dataset for large-scale image classification. Our results show favorable performance on a subset of ImageNet containing 196 categories. We also investigate the performance of sparse coding by comparing different combination of algorithms in learning a dictionary and sparse representations. Although there is a natural pair of algorithms to learn a dictionary and sparse representations (e.g., K-SVD with respect to Orthogonal Matching Pursuit), breaking such a pair and rematching are found to result in even better performance. Moreover, detailed comparison indicates that ℓ1-regularized solver to sparse representation mainly benefit the classification accuracy, regardless of the choice of dictionaries.
  • Keywords
    computer vision; feature extraction; gradient methods; image classification; image colour analysis; image matching; image representation; learning (artificial intelligence); stochastic processes; support vector machines; transforms; ℓ1-regularized solver; ASGD algorithm; Fisher vectors; ImageNet; Internet; K-SVD; SIFT; averaged stochastic gradient descent algorithm; classification accuracy; classification model; color-based feature; computer vision; confidence value generation; dictionary learning; feature representation discrimination; image collection; incremental learning; large-scale image classification; large-scale visual recognition; learning framework; mobile device; object category; orthogonal matching pursuit; scalable SVM; semantic meaning; sparse coding decoupling; sparse representation; Accuracy; Dictionaries; Encoding; Feature extraction; Image coding; Matching pursuit algorithms; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on
  • Conference_Location
    Portland, OR
  • Type

    conf

  • DOI
    10.1109/CVPRW.2013.74
  • Filename
    6595913