• DocumentCode
    3748709
  • Title

    Guiding the Long-Short Term Memory Model for Image Caption Generation

  • Author

    Xu Jia;Efstratios Gavves;Basura Fernando;Tinne Tuytelaars

  • fYear
    2015
  • Firstpage
    2407
  • Lastpage
    2415
  • Abstract
    In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search to avoid bias towards short sentences. On various benchmark datasets such as Flickr8K, Flickr30K and MS COCO, we obtain results that are on par with or better than the current state-of-the-art.
  • Keywords
    "Semantics","Computer architecture","Logic gates","Microprocessors","Visualization","Training","Pipelines"
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision (ICCV), 2015 IEEE International Conference on
  • Electronic_ISBN
    2380-7504
  • Type

    conf

  • DOI
    10.1109/ICCV.2015.277
  • Filename
    7410634