• DocumentCode
    1439984
  • Title

    Robust Spatiotemporal Matching of Electronic Slides to Presentation Videos

  • Author

    Fan, Quanfu ; Barnard, Kobus ; Amir, Arnon ; Efrat, Alon

  • Author_Institution
    T. J. Watson Res. Center, IBM, Armonk, NY, USA
  • Volume
    20
  • Issue
    8
  • fYear
    2011
  • Firstpage
    2315
  • Lastpage
    2328
  • Abstract
    We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.
  • Keywords
    distance learning; hidden Markov models; image matching; probability; random processes; technical presentation; transforms; video signal processing; HMM; RANSAC; SIFT keypoints; arbitrary slides sequence; binary classifier; camera events; camera movement; detected camera operations; distance-learning applications; electronic slides; estimated matching probability; frame composition; hidden Markov model; homography; image-based matching; low-quality video capture; projective transformation; quantitative experiments; random sample consensus; robust automatic matching; robust spatiotemporal matching; scale-invariant feature-transformation keypoints; slide distortion; temporal information; temporal model; video browsing; video frames; video indexing; video presentation; video searching; visual information; Accuracy; Cameras; Hidden Markov models; Image color analysis; Robustness; Synchronization; Videos; Distance learning; homography constraint; matching slides to video frames; scale-invariant feature-transformation (SIFT) keypoints; video indexing and browsing;
  • fLanguage
    English
  • Journal_Title
    Image Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1057-7149
  • Type

    jour

  • DOI
    10.1109/TIP.2011.2109727
  • Filename
    5705574