• DocumentCode
    1759537
  • Title

    Fashion Parsing With Video Context

  • Author

    Si Liu ; Xiaodan Liang ; Luoqi Liu ; Ke Lu ; Liang Lin ; Xiaochun Cao ; Shuicheng Yan

  • Author_Institution
    State Key Lab. of Inf. Security, Inst. of Inf. Eng., Beijing, China
  • Volume
    17
  • Issue
    8
  • fYear
    2015
  • fDate
    Aug. 2015
  • Firstpage
    1347
  • Lastpage
    1358
  • Abstract
    In this paper, we propose a novel semi- supervised learning strategy to address human parsing. Existing human parsing datasets are relatively small due to the required tedious human labeling. We present a general, affordable and scalable solution, which harnesses the rich contexts in those easily available web videos to boost any existing human parser. First, we crawl a large number of unlabeled videos from the web. Then for each video, the cross-frame contexts are utilized for human pose co- estimation , and then video co-parsing to obtain satisfactory human parsing results for all frames. More specifically, SIFT flow and super-pixel matching are used to build correspondences across different frames, and these correspondences then contextualize the pose estimation and human parsing in individual frames. Finally these parsed video frames are used as the reference corpus for the non-parametric human parsing component of the whole solution. To further improve the accuracy of video co-parsing, we propose an active learning method to incorporate human guidance, where the labelers are required to assess the accuracies of the pose estimation results of certain selected video frames. Then we take reliable frames as the seed frames to guide the video pose co-estimation. Our human parsing framework can then easily incorporate the human feedback to train a better fashion parser. Extensive experiments on two benchmark fashion datasets as well as a newly collected challenging Fashion Icon dataset well demonstrate the encouraging performance gain from our general pipeline for human parsing.
  • Keywords
    humanities; image classification; image matching; learning (artificial intelligence); pose estimation; transforms; video signal processing; Fashion Icon dataset; SIFT flow; Web videos; active learning method; fashion parsing; human parsing; human pose co-estimation; semisupervised learning strategy; super-pixel matching; video co-parsing; video pose co-estimation; Clothing; Context; Estimation; Feature extraction; Learning systems; Reliability; Semantics; Information retrieval; professional communication;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2015.2443559
  • Filename
    7120998