• DocumentCode
    3407785
  • Title

    Talking pictures: Temporal grouping and dialog-supervised person recognition

  • Author

    Cour, Timothee ; Sapp, Benjamin ; Nagle, Akash ; Taskar, Ben

  • fYear
    2010
  • fDate
    13-18 June 2010
  • Firstpage
    1014
  • Lastpage
    1021
  • Abstract
    We address the character identification problem in movies and television videos: assigning names to faces on the screen. Most prior work on person recognition in video assumes some supervised data such as screenplay or handlabeled faces. In this paper, our only source of `supervision´ are the dialog cues: first, second and third person references (such as “I´m Jack”, “Hey, Jack!” and “Jack left”). While this kind of supervision is sparse and indirect, we exploit multiple modalities and their interactions (appearance, dialog, mouth movement, synchrony, continuity-editing cues) to effectively resolve identities through local temporal grouping followed by global weakly supervised recognition. We propose a novel temporal grouping model that partitions face tracks across multiple shots while respecting appearance, geometric and film-editing cues and constraints. In this model, states represent partitions of the k most recent face tracks, and transitions represent compatibility of consecutive partitions. We present dynamic programming inference and discriminative learning for the model. The individual face tracks are subsequently assigned a name by learning a classifier from partial label constraints. The weakly supervised classifier incorporates multiple-instance constraints from dialog cues as well as soft grouping constraints from our temporal grouping. We evaluate both the temporal grouping and final character naming on several hours of TV and movies.
  • Keywords
    dynamic programming; image recognition; inference mechanisms; learning (artificial intelligence); character identification problem; dialog-supervised person recognition; discriminative learning; dynamic programming inference; multiple-instance constraint; soft grouping constraint; supervised classifier; talking pictures; temporal grouping model; Character recognition; Dynamic programming; Face recognition; Large-scale systems; Lifting equipment; Motion pictures; Mouth; Solid modeling; TV; Videos;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
  • Conference_Location
    San Francisco, CA
  • ISSN
    1063-6919
  • Print_ISBN
    978-1-4244-6984-0
  • Type

    conf

  • DOI
    10.1109/CVPR.2010.5540106
  • Filename
    5540106